Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

everyone -

I want to preface this by saying that I already looked at this link to try to solve my problem:

Applying the same factor levels to multiple variables in an R data frame

The difference is that in that problem, the OP wanted to change the levels of factors that all had the same levels. In my instance, I'm looking to change just the first level, which is set to ' ', to something like 'Unknown' and leave the rest of the levels alone. I know I could do this in a "non-R" way with something like this:

for (i in 64:88) {
  var.name <- colnames(df[i])
  levels(eval(parse(text=paste('df$', var.name, sep=''))))[levels(eval(parse(text=paste('df$', var.name, sep='')))) == ' '] <- 'Unknown'
}

But that's an inefficient way to do it. Trying to use the method proposed in the question linked above gave me this code:

df[64:88] <- lapply(df[64:88], factor, levels=c('Unknown', ??))

I don't know what to put in place of the question marks. I tried using just "levels[-1]" but it's obvious why that didn't work. I also tried "levels(df[64:88])[-1]" but again no good. So I tried to revamp the code with the following:

df[64:88] <- lapply(df[64:88], function(x) levels(x)[levels(x) == ' '] <- 'Unknown')

but I get NULL whenever I call levels$transaction_type1 (where transaction_type1 is the column name of df[64]).

What am I missing here?

Thanks in advance for your help!

Per a couple of requests, here is an example of my data:

df$transaction_type1[1:100]
  [1]                                                                                                                                                
 [13] HOME RENEW                                                                                                                                     
 [25]                                                                                                                                                
 [37]                                                                                                                                                
 [49]                                                                                                                                                
 [61] AUTO MANAGE                                                                                     AUTO RENEW                                     
 [73]             AUTO MANAGE                                                                                     AUTO RENEW                         
 [85]                                                                                                                                                
 [97]                                                
Levels:   AUTO CLAIM AUTO MANAGE AUTO PURCHASE AUTO RENEW HOME CLAIM HOME RENEW

As you can see, there is a lot of values equal to ' ' and all 25 variables look just like this, but with different levels. My data consists of 222 variables and 24,850 rows, so I don't know what the standard is on SO for giving example data. Also, this snippet of code might help as well:

> levels(df$transaction_type1)
#[1] " "             "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"

> levels(df$transaction_type1)[levels(df$transaction_type1) == ' '] <- 'Unknown'
> levels(df$transaction_type1)
#[1] "Unknown"       "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"   

If more information is needed, please let me know so I can provide it and also learn the SO standards of asking for help. Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
430 views
Welcome To Ask or Share your Answers For Others

1 Answer

Something like this?

# it seems like your original data has a structure like this
df <- data.frame(x = factor(c("a", "", "b"), levels = c("", "a", "b")),
                 y = factor(c("c", "", "d"), levels = c("", "c", "d")))

lapply(df, levels)
# $x
# [1] ""  "a" "b"
# 
# $y
# [1] ""  "c" "d"    

# change the "" level to "unknown", and return the updated vector
df[] <- lapply(df, function(x){
 levels(x)[levels(x) == ""] <- "unknown"
 x
 })

lapply(df, levels)
# $x
# [1] "unknown" "a"       "b"      
# 
# $y
# [1] "unknown" "c"       "d"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...