data.table
is a fantastic R package and I am using it in a library I am developing. So far all is going very well, except for one complication. It seems to be much more difficult (compared to the conventional data frames) to refer to data.table
columns using names saved in variables (as for data frames would be, for example: colname="col"; df[df[,colname]<5,colname]=0
).
Perhaps what complicates the things most is the apparent lack of consistency of syntax on this in data.table
. In some cases, eval(colname)
and get(colname)
, or even c(colname)
seem to work. In others, DT[,colname, with=F]
is the solution. Yet in others, such as, for example, the set()
and subset()
functions, I haven't found a solution at all. Finally, an extreme, albeit also quite common use case was discussed earlier (passing column names to data.table programmatically) and the proposed solutions, albeit apparently doing their job, did not seem particularly readable...
Perhaps I am complicating things too much? If anyone could jot down a quick cheatsheet for referring to data.table
column names using variables for different common scenarios, I would be very grateful.
UPDATE:
Some specific examples that work provided I can hard code column names:
x.short = subset(x, abs(dist)<=100)
set(x, which(x$val<10), "val", 0)
Now assume distcol="dist"
, valcol="val"
. What is the best way to do the above using distcol
and valcol
, but not dist
and val
?