Having df1
and df2
as follows:
df1 <- read.table(text =" x y z
1 1 1
1 2 1
1 1 2
2 1 1
2 2 2",header=TRUE)
df2 <- read.table(text =" a b c
1 1 1
1 2 8
1 1 2
2 6 2",header=TRUE)
I can ask of the data a bunch of things like:
df2[ df2$b == 6 | df2$c == 8 ,] #any rows where b=6 plus c=8 in df2
#and additive conditions
df2[ df2$b == 6 & df2$c == 8 ,] # zero rows
between data.frame:
df1[ df1$z %in% df2$c ,] # rows in df1 where values in z are in c (allrows)
This gives me all rows:
df1[ (df1$x %in% df2$a) &
(df1$y %in% df2$b) &
(df1$z %in% df2$c) ,]
but shouldn't this give me all rows of df1
too:
df1[ df1$z %in% df2$c | df1$b == 9,]
What I am really hoping to do is to subset df1
an df2
on three column conditions,
so that I only get rows in df1 where a,b,c all equal x,y,z at the same time within a row. In real data i will have more than 3 columns but I will still want to subset on 3 additive column conditions.
So subsetting my example data df1
on df2
my result would be:
df1
1 1 1
1 1 2
Playing with syntax has confusedme more and the SO posts are all variaion of what I want that actually lead to more confusion for me.
I figured out I can do this:
merge(df1,df2, by.x=c("x","y","z"),by.y=c("a","b","c"))
which gives me what I want, but I would like to understand why I am wrong in my [
attempts.