Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have few huge datatable dt_1, dt_2, ..., dt_N with same cols. I want to bind them together into a single datatable. If I use

dt <- rbind(dt_1, dt_2, ..., dt_N)

or

dt <- rbindlist(list(dt_1, dt_2, ..., dt_N))

then the memory usage is approximately double the amount needed for dt_1,dt_2,...,dt_N. Is there a way to bind them wihout increasing the memory consumption significantly? Note that I do not need dt_1, dt_2, ..., dt_N once they are combined together.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
996 views
Welcome To Ask or Share your Answers For Others

1 Answer

Other approach, using a temporary file to 'bind':

nobs=10000
d1 <- d2 <- d3 <-  data.table(a=rnorm(nobs),b=rnorm(nobs))
ll<-c('d1','d2','d3')
tmp<-tempfile()

# Write all, writing header only for the first one
for(i in seq_along(ll)) {
  write.table(get(ll[i]),tmp,append=(i!=1),row.names=FALSE,col.names=(i==1))
}

# 'Cleanup' the original objects from memory (should be done by the gc if needed when loading the file
rm(list=ll)

# Read the file in the new object
dt<-fread(tmp)

# Remove the file
unlink(tmp)

Obviously slower than the rbind method, but if you have memory contention, this won't be slower than requiring the system to swap out memory pages.

Of course if your orignal objects are loaded from file at first, prefer concatenating the files before loading in R with another tool most aimed at working with files (cat, awk, etc.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...