I want to find the lead() and lag() element in each group, but had some wrong results.
For example, data is like this:
library(dplyr)
df = data.frame(name=rep(c('Al','Jen'),3),
score=rep(c(100, 80, 60),2))
df
Data:
name score
1 Al 100
2 Jen 80
3 Al 60
4 Jen 100
5 Al 80
6 Jen 60
Now I try to find out lead() and lag() scores for each person. If I sort it using arrange(), I can get the correct answer:
df %>%
arrange(name) %>%
group_by(name) %>%
mutate(next.score = lead(score),
before.score = lag(score) )
OUTPUT1:
Source: local data frame [6 x 4]
Groups: name
name score next.score before.score
1 Al 100 60 NA
2 Al 60 80 100
3 Al 80 NA 60
4 Jen 80 100 NA
5 Jen 100 60 80
6 Jen 60 NA 100
Without arrange(), the result is wrong:
df %>%
group_by(name) %>%
mutate(next.score = lead(score),
before.score = lag(score) )
OUTPUT2:
Source: local data frame [6 x 4]
Groups: name
name score next.score before.score
1 Al 100 80 NA
2 Jen 80 60 NA
3 Al 60 100 80
4 Jen 100 80 60
5 Al 80 NA 100
6 Jen 60 NA 80
E.g., in 1st line, Al's next.score should be 60 (3rd line).
Anybody know why this happened? Why arrange() affects the result (the values, not just about the order)? Thanks~
See Question&Answers more detail:os