Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a data set which looks like the following (partially):

id  name    dummy
1   Jane    1
1   Jane    0
1   Jane    1
2   Mike    0
2   Mike    0
2   Mike    0
2   Mike    0
2   Mike    0
3   Tom     1
3   Tom     1
3   Tom     0
3   Tom     0

I'm trying to eliminate the people where ALL of the variable dummy is 0. So for instance, Tom and Jane would not be eliminated because they have dummy variable 0 or 1, but Mike will be eliminated because he has all 0s. So I would want in the end

   id   name    dummy
    1   Jane    1
    1   Jane    0
    1   Jane    1
    3   Tom     1
    3   Tom     1
    3   Tom     0
    3   Tom     0

I thought about sorting the data frame according to dummy but I can't seem to figure out how to deal with the fact that I'm only trying to eliminate the people who only has 0 values for the variable dummy. Any suggestions would be really helpful!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
394 views
Welcome To Ask or Share your Answers For Others

1 Answer

Consider df is your data.frame, then use tapply and [ to subset what you want:

> ind <- with(df, tapply(dummy, name, sum))
> df[df$name %in% names(ind)[ind!=0], ]
   id name dummy
1   1 Jane     1
2   1 Jane     0
3   1 Jane     1
9   3  Tom     1
10  3  Tom     1
11  3  Tom     0
12  3  Tom     0

Another alternative:

> result <- split(df, df$name)[with(df, tapply(dummy, name, function(x) sum(x)!=0))]
> do.call(rbind, result)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...