I have several different txt files with the same structure. Now I want to read them into R using fread, and then union them into a bigger dataset.
## First put all file names into a list
library(data.table)
all.files <- list.files(path = "C:/Users",pattern = ".txt")
## Read data using fread
readdata <- function(fn){
dt_temp <- fread(fn, sep=",")
keycols <- c("ID", "date")
setkeyv(dt_temp,keycols) # Notice there's a "v" after setkey with multiple keys
return(dt_temp)
}
# then using
mylist <- lapply(all.files, readdata)
mydata <- do.call('rbind',mylist)
The code works fine, but the speed is not satisfactory. Each txt file has 1M observations and 12 fields.
If I use the fread
to read a single file, it's fast. But using apply
, then speed is extremely slow, and obviously take much time than reading files one by one. I wonder where went wrong here, is there any improvements for the speed gain?
I tried the llply
in plyr
package, there're not much speed gains.
Also, is there any syntax in data.table
to achieve vertical join like rbind
and union
in sql
?
Thanks.
See Question&Answers more detail:os