问题
I have the following dataset
# Import pandas library
import pandas as pd
import numpy as np
# initialize list of lists
data = [['tom', 10,1], ['tom', 15,5], ['tom', 14,1], ['tom', 15,4], ['tom', 18,1], ['tom', 15,6], ['tom', 17,3]
, ['tom', 14,7], ['tom',16 ,6], ['tom', 22,2],['matt', 10,1], ['matt', 15,5], ['matt', 14,1], ['matt', 15,4], ['matt', 18,1], ['matt', 15,6], ['matt', 17,3]
, ['matt', 14,7], ['matt',16 ,6], ['matt', 22,2]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score'])
print(df)
Name Attempts Score
0 tom 10 1
1 tom 15 5
2 tom 14 1
3 tom 15 4
4 tom 18 1
5 tom 15 6
6 tom 17 3
7 tom 14 7
8 tom 16 6
9 tom 22 2
10 matt 10 1
11 matt 15 5
12 matt 14 1
13 matt 15 4
14 matt 18 1
15 matt 15 6
16 matt 17 3
17 matt 14 7
18 matt 16 6
19 matt 22 2
The i added some custom metrics to get the previous 3 and 5 moving averages for the column Attempts
:
#AVE TIME OF LAST 3/5 Attempts
df['Ave3Attempts']=df.groupby('Name').Attempts.apply(lambda x : x.shift().rolling(3,min_periods=1).mean().fillna(x))
df['Ave5Attempts']=df.groupby('Name').Attempts.apply(lambda x : x.shift().rolling(5,min_periods=1).mean().fillna(x))
print(df.round(2))
Name Attempts Score Ave3Attempts Ave5Attempts
0 tom 10 1 10.00 10.0
1 tom 15 5 10.00 10.0
2 tom 14 1 12.50 12.5
3 tom 15 4 13.00 13.0
4 tom 18 1 14.67 13.5
5 tom 15 6 15.67 14.4
6 tom 17 3 16.00 15.4
7 tom 14 7 16.67 15.8
8 tom 16 6 15.33 15.8
9 tom 22 2 15.67 16.0
10 matt 10 1 10.00 10.0
11 matt 15 5 10.00 10.0
12 matt 14 1 12.50 12.5
13 matt 15 4 13.00 13.0
14 matt 18 1 14.67 13.5
15 matt 15 6 15.67 14.4
16 matt 17 3 16.00 15.4
17 matt 14 7 16.67 15.8
18 matt 16 6 15.33 15.8
19 matt 22 2 15.67 16.0
I have then used this set to create a model to predict Score
through sklearn
train/test using those Ave3Attempts
and Ave5Attempts
columns.
Now that I have my model I am trying to create a summary table of the most recent data for each person to then look up to predict the score
Essentially trying to produce a new dataframe to then use as part of a new prediction:
Name Ave3Attempts Ave5Attempts
0 tom 17.33 16.8
1 matt 17.33 16.8
Any help on how to do this would be great! thanks!
回答1:
You can use this code:
df2 = pd.DataFrame([], index=df['Name'].unique())
# looping through names
for name in df['Name'].unique():
df2.loc[name, "Ave3Attempts"] = df[ df['Name']==name ]['Attempts'].tail(3).mean()
df2.loc[name, "Ave5Attempts"] = df[ df['Name']==name ]['Attempts'].tail(5).mean()
print(df2)
来源:https://stackoverflow.com/questions/62112717/how-to-create-a-summary-table-to-use-in-model-pandas-python