So I have a spark dataframe that looks like:
a | b | c
5 | 2 | 1
5 | 4 | 3
2 | 4 | 2
2 | 3 | 7
And I want to group by column a, create a list of values from column b, and forget about c. The output dataframe would be :
a | b_list
5 | (2,4)
2 | (4,3)
How would I go about doing this with a pyspark sql dataframe?
Thank you! :)
See Question&Answers more detail:os