I'm trying to left join two data frames (df1, df2). The data frames have two columns in common: zone and slope. Zone is a factor column and slope is numeric.
df1 = data.frame(slope = c(1:6), zone = c(rep("Low", 3), rep("High", 3)))
df2 = data.frame(slope = c(2.4, 2.4,6.2), zone = c(rep("Low", 1), rep("High", 2)), other = c(rep("a", 1), rep("b", 1), rep("c", 1)))
df1
df2
I want to join the data frames such that they are first matched exactly on zone, and then the closest match for slope. If there are two slope values that equidistant, it doesn't matter if the join rounds up or down as long as the rule is applied consistently and it does not result in duplicate rows.
I'd prefer to do this with a fuzzy_join or dplyr rather than data.table.
The result should look something like:
df3 = data.frame(slope = c(1:6), zone = c(rep("Low", 3), rep("High", 3)), other = c(rep("a", 3), rep("b",1), rep("c",2)))
df3
where the value of "other" is first determined by zone, and then the closest slope.
I've tried:
distance_left_join(df, df2, by=c("zone"= "zone", "slope"="slope"))
as well as other types of fuzzy joins, but I think they may not be working because the columns are of different types. I suspect there is a fuzzy_left_join solution, but I don't understand how to create a match function.
See Question&Answers more detail:os