问题
Working with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories.
There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.
My (simplified) dataframe looks like this:
source text sent
--------------------------------
bar some string 0.13
foo alt string -0.8
bar another str 0.7
foo some text -0.2
foo more text -0.5
The output from this should be something like this:
source count mean_sent
-----------------------------
foo 3 -0.5
bar 2 0.415
The answer is somewhere along the lines of:
df['sent'].groupby(df['source']).mean()
Yet only gives each source and it's mean, with no column headers.
Thanks in advance!
回答1:
You can use groupby with aggregate:
df = df.groupby('source') \
.agg({'text':'size', 'sent':'mean'}) \
.rename(columns={'text':'count','sent':'mean_sent'}) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500
回答2:
In newer versions of Panda you don't need the rename anymore, just use named parameters:
df = df.groupby('source') \
.agg(count=('text', 'size'), mean_sent=('sent', 'mean')) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500
回答3:
Below one should work fine:
df[['source','sent']].groupby('source').agg(['count','mean'])
回答4:
I think this should provide the output that you wanted:
result = pd.DataFrame(df.groupby('source').size())
results['mean_score'] = df.groupby('source').sent.mean()
来源:https://stackoverflow.com/questions/41040132/pandas-groupby-count-and-mean-combined