I have a list of dataframes for which I am certain that they all contain at least one row (in fact, some contain only one row, and others contain a given number of rows), and that they all have the same columns (names and types). In case it matters, I am also certain that there are no NA's anywhere in the rows.
The situation can be simulated like this:
#create one row
onerowdfr<-do.call(data.frame, c(list(), rnorm(100) , lapply(sample(letters[1:2], 100, replace=TRUE), function(x){factor(x, levels=letters[1:2])})))
colnames(onerowdfr)<-c(paste("cnt", 1:100, sep=""), paste("cat", 1:100, sep=""))
#reuse it in a list
someParts<-lapply(rbinom(200, 1, 14/200)*6+1, function(reps){onerowdfr[rep(1, reps),]})
I've set the parameters (of the randomization) so that they approximate my true situation.
Now, I want to unite all these dataframes in one dataframe. I thought using rbind would do the trick, like this:
system.time(
result<-do.call(rbind, someParts)
)
Now, on my system (which is not particularly slow), and with the settings above, this takes is the output of the system.time:
user system elapsed
5.61 0.00 5.62
Nearly 6 seconds for rbind-ing 254 (in my case) rows of 200 variables? Surely there has to be a way to improve the performance here? In my code, I have to do similar things very often (it is a from of multiple imputation), so I need this to be as fast as possible.
See Question&Answers more detail:os