Problem I am trying to use dyplr & tidyr to achieve an output table (like a contingency table I think) which summarises this data into frequency (eg a count of titles, descriptions & bodies which are negative, neutral and positive numbers). I have tried a number of different methods and the closest example I can find is at Using Tidyr/Dplyr to summarise counts of groups of strings. But this doesn't fit, quite.
Example Data The data looks a little like...
df <- data.frame( "story_title"=c(0.0,0.0,0.0,-1.0,1.0),
"story_description"=c(-0.3,-0.3,-0.3,0.5,0.3),
"story_body"=c(-0.3,0.2,0.4,0.2,0))
Desired Output The output would hopefully look a bit like this, showing the summary frequencies for each story part...
Negative Neutral Positive
story_title 1 3 1
story_description 3 0 2
story_body 1 1 3
(edited totals for story_body - Thanks Akrun)
Attempted Approach
If I'm right the first step will be to reshape the data using gather
thusly...
df <- df %>% gather(type,score,starts_with("story"))
> df
type score
1 story_title 0.0
2 story_title 0.0
3 story_title 0.0
4 story_title -1.0
5 story_title 1.0
6 story_description -0.3
7 story_description -0.3
8 story_description -0.3
9 story_description 0.5
10 story_description 0.3
11 story_body -0.3
12 story_body 0.2
13 story_body 0.4
14 story_body 0.2
15 story_body 0.0
From here I think it's a combination of group_by and summarise and I've tried...
df %>% group_by(sentiment) %>%
summarise(Negative = count("sentiment_title"<0),
Neutral = count("sentiment_title"=0),
Positive = count("sentiment_title">0)
)
Obviously this hasn't worked.
Can anyone help with a dplyr/tidyr solution (a base table answer would also be useful as an example)?
See Question&Answers more detail:os