I though I had read somewhere (can't remember where) that factors were not actually more efficient than character vectors in data.table. Is this true? I was debating whether to continue using factors to store various vectors in data.table. An approximate test with object.size
seems to indicate otherwise.
chars <- data.table(a = sample(letters, 1e5, TRUE)) # chars (not really)
string <- data.table(a = sample(state.name, 1e5, TRUE)) # strings
fact <- data.table(a = factor(sample(letters, 1e5, TRUE))) # factor
int <- data.table(a = sample(1:26, 1e5, TRUE)) # int
mbs <- function(...) {
ns <- sapply(match.call(expand.dots=TRUE)[-1L], deparse)
vals <- mget(ns, .GlobalEnv)
cat('Sizes:
',
paste('', ns, ':', round(sapply(vals, object.size)/1024/1024, 3), 'MB
'))
}
## Get approximate sizes?
mbs(chars, string, fact, int)
# Sizes:
# chars : 0.765 MB
# string : 0.766 MB
# fact : 0.384 MB
# int : 0.382 MB
See Question&Answers more detail:os