Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have the following dataframe:

import pandas as pd
import numpy as np
np.random.seed(123)
n = 10
df = pd.DataFrame({"val": np.random.randint(1, 10, n), 
                   "cat": np.random.choice(["X", "Y", "Z"], n)})

   val cat
0    3   Z
1    3   X
2    7   Y
3    2   Z
4    4   Y
5    7   X
6    2   X
7    1   X
8    2   X
9    1   Y

I want to know the percentage each category X, Y, and Z has of the entire val column sum. I can aggregate df like this:

total_sum = df.val.sum()
#32
s = df.groupby("cat").val.sum().div(total_sum)*100

#this is the desired result in % of total val
cat
X    46.875  #15/32
Y    37.500  #12/32
Z    15.625  #5/32
Name: val, dtype: float64

However, I find it rather surprising that pandas seemingly does not have a percentage/frequency function something like df.groupby("cat").val.freq() instead of df.groupby("cat").val.sum() or df.groupby("cat").val.mean(). I assumed this is a common operation, and Series.value_counts has implemented this with normalize=True - but for groupby aggregation, I cannot find anything similar. Am I missing here something or is there indeed no out-of-the-box function?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
4.5k views
Welcome To Ask or Share your Answers For Others

1 Answer

等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...