What is the most efficient way to update/replace NAs in main dataset with (correct) values in a lookup table? This is such a common operation! Similar questions do not seem to have tidy solutions.
Constraints:
1) Please assume a large number of missing values and bigger lookup table than the example given. So case-wise replacement operations would be impractical (no case_when
, if_else
, etc.)
2)The lookup table does not have all values of main dataframe, only the replacement ones.
Tidyverse solution answer much preferred. Similar questions do not seem to have tidy solutions.
library(tidyverse)
### Main Dataframe ###
df1 <- tibble(
state_abbrev = state.abb[1:10],
state_name = c(state.name[1:5], rep(NA, 3), state.name[9:10]),
value = sample(500:1200, 10, replace=TRUE)
)
#> # A tibble: 10 x 3
#> state_abbrev state_name value
#> <chr> <chr> <int>
#> 1 AL Alabama 551
#> 2 AK Alaska 765
#> 3 AZ Arizona 508
#> 4 AR Arkansas 756
#> 5 CA California 741
#> 6 CO <NA> 1100
#> 7 CT <NA> 719
#> 8 DE <NA> 874
#> 9 FL Florida 749
#> 10 GA Georgia 580
### Lookup Dataframe ###
lookup_df <- tibble(
state_abbrev = state.abb[6:8],
state_name = state.name[6:8]
)
#> # A tibble: 3 x 2
#> state_abbrev state_name
#> <chr> <chr>
#> 1 CO Colorado
#> 2 CT Connecticut
#> 3 DE Delaware
Ideally, a left_join would have a replacement option for missing values. Alas...
left_join(df1, lookup_df)
#> Joining, by = c("state_abbrev", "state_name")
#> # A tibble: 10 x 3
#> state_abbrev state_name value
#> <chr> <chr> <int>
#> 1 AL Alabama 551
#> 2 AK Alaska 765
#> 3 AZ Arizona 508
#> 4 AR Arkansas 756
#> 5 CA California 741
#> 6 CO <NA> 1100
#> 7 CT <NA> 719
#> 8 DE <NA> 874
#> 9 FL Florida 749
#> 10 GA Georgia 580
```
Created on 2018-07-28 by the reprex package (v0.2.0).
See Question&Answers more detail:os