I had a pd.DataFrame that I converted to Dask.DataFrame for faster computations. My requirement is that I have to find out the 'Total Views' of a channel.
In pandas it would be, df.groupby(['ChannelTitle'])['VideoViewCount'].sum()
but in dask the columns dtypes is object
and groupby
is taking these as string and not int
(see image 2)
To handle above issue, I added two columns separating figure(115) and multiplier(6 for M, 3 for K) of views hoping to do an operation like ddf['new_views_f'] * (10**ddf['new_views_m'])
, but now I cannot find mul
for two columns in dask.
Either I am missing something or complicating the requirement.
question from:https://stackoverflow.com/questions/66053231/dask-dataframe-groupby-and-aggregate-for-column