问题
Assume I have a dataset like this:
userID productID rating
a i 5
b i 4
c i 4
a j 3
b j 5
The question is, how can I calculate the mean rating of each user? I saw this answer, but I didn't quite understand it. I would really appreciate your time, if you show some guidance.
回答1:
I work in an IPython Notebook.
Let's assume you have this file user_ratings.csv
:
userID productID rating
a i 5
b i 4
c i 4
a j 3
b j 5
The example in the link uses pandas. So import pandas:
In [1]: import pandas as pd
Read your file into a dataframe:
In [2]: df = pd.read_csv('user_ratings.csv', delim_whitespace=True)
df
Group by the user and calculate the mean for each:
In [2]: df.groupby('userID').mean()
You can also create a new column in df
named user_avg_rating
an assign the mean score of each user to it:
In [3]: df['user_avg_rating'] = df.groupby('userID')['rating'].transform('mean')
df
The method transform
takes your grouped object and creates a series:
In [4]: df.groupby('userID')['rating'].transform('mean')
0 4.0
1 4.5
2 4.0
3 4.0
4 4.5
dtype: float64
This series is assigned to the column user_avg_rating
.
来源:https://stackoverflow.com/questions/34409600/how-to-calculate-the-mean-of-ratings-of-each-user