Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

How can I collapse my data frame where many observations have multiple rows but at most only one value for each of several different variables?

Here's what I have:

id  title info                 var1     var2        var3
1   foo   Some string here     string 1     
1   foo   Some string here              string 2 
1   foo   Some string here                          string 3
2   bar   A different string   string 4 string 5    
2   bar   A different string                        string 6
3   baz   Something else       string 7             string 8

Here's what I want:

id  title info                  var1        var2        var3
1   foo   Some string here      string 1    string 2    string 3
2   bar   A different string    string 4    string 5    string 6
3   baz   Something else        string 7                string 8

I think I've got it with

ddply(merged, .(id, title, info), summarize, var1 = max(var1), var2 = max(var2), var3 = max(var3))

But the problem is that there are many more of the var1-var3 variables, and they are programmatically generated. As a result, I need a way to insert var1 = max(var1), etc. programmatically, based on an list of the variable names.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
853 views
Welcome To Ask or Share your Answers For Others

1 Answer

Many possible ways achieving this, here are two

Define some helper function

Myfunc <- function(x) x[x != '']

Using data.table

library(data.table)
setDT(df)[, lapply(.SD, Myfunc), by = list(id, title, info)]
#    id title               info     var1     var2     var3
# 1:  1   foo   Some string here string 1 string 2 string 3
# 2:  2   bar A different string string 4 string 5 string 6
# 3:  3   baz     Something else string 7       NA string 8

Or similarly with dplyr

library(dplyr)
df %>%
  group_by(id, title, info) %>%
  summarise_each(funs(Myfunc))

# Source: local data table [3 x 6]
# Groups: id, title
# 
#   id title               info     var1     var2     var3
# 1  1   foo   Some string here string 1 string 2 string 3
# 2  2   bar A different string string 4 string 5 string 6
# 3  3   baz     Something else string 7       NA string 8

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...