Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a DataFrame of time-series data, and I'd like to sum the rows which have the same name, eg:

df = DataFrame(Name = ["A", "A", "B", "B"], Day_1 = [0, 2, 1, 4], Day_2 = [2, 3, 5, 7], Day_3 = [1, 3, 6, 2])

|-------|------|-------|-------|-------|
| Row # | Name | Day_1 | Day_2 | Day_3 |
| 1     | "A"  |   0   |   2   |   1   |
| 2     | "A"  |   2   |   3   |   3   |
| 3     | "B"  |   1   |   5   |   6   |
| 4     | "B"  |   4   |   7   |   2   |

And I would like it to output:

|-------|------|-------|-------|-------|
| Row # | Name | Day_1 | Day_2 | Day_3 |
| 1     | "A"  |   2   |   5   |   4   |
| 2     | "B"  |   5   |   12  |   8   |

I have a list of all the different names, but do not know how many rows each name corresponds to.

Thank you!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
3.4k views
Welcome To Ask or Share your Answers For Others

1 Answer

Here is how to do it:

julia> df
4×4 DataFrame
 Row │ Name    Day_1  Day_2  Day_3
     │ String  Int64  Int64  Int64
─────┼─────────────────────────────
   1 │ A           0      2      1
   2 │ A           2      3      3
   3 │ B           1      5      6
   4 │ B           4      7      2

julia> combine(groupby(df, :Name), names(df, Not(:Name)) .=> sum, renamecols=false)
2×4 DataFrame
 Row │ Name    Day_1  Day_2  Day_3
     │ String  Int64  Int64  Int64
─────┼─────────────────────────────
   1 │ A           2      5      4
   2 │ B           5     12      8

note: from the output I can see you might be using an old version of DataFrames.jl. I recommend you to update it to the latest released version, that is 0.22.2.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...