How to create a summary table to use in model pandas python

问题

I have the following dataset

# Import pandas library 
import pandas as pd
import numpy as np

# initialize list of lists 
data = [['tom', 10,1], ['tom', 15,5], ['tom', 14,1], ['tom', 15,4], ['tom', 18,1], ['tom', 15,6], ['tom', 17,3]
       , ['tom', 14,7], ['tom',16 ,6], ['tom', 22,2],['matt', 10,1], ['matt', 15,5], ['matt', 14,1], ['matt', 15,4], ['matt', 18,1], ['matt', 15,6], ['matt', 17,3]
       , ['matt', 14,7], ['matt',16 ,6], ['matt', 22,2]] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score']) 
print(df)
   Name  Attempts  Score
0    tom        10      1
1    tom        15      5
2    tom        14      1
3    tom        15      4
4    tom        18      1
5    tom        15      6
6    tom        17      3
7    tom        14      7
8    tom        16      6
9    tom        22      2
10  matt        10      1
11  matt        15      5
12  matt        14      1
13  matt        15      4
14  matt        18      1
15  matt        15      6
16  matt        17      3
17  matt        14      7
18  matt        16      6
19  matt        22      2

The i added some custom metrics to get the previous 3 and 5 moving averages for the column Attempts:

#AVE TIME OF LAST 3/5 Attempts
df['Ave3Attempts']=df.groupby('Name').Attempts.apply(lambda x : x.shift().rolling(3,min_periods=1).mean().fillna(x))
df['Ave5Attempts']=df.groupby('Name').Attempts.apply(lambda x : x.shift().rolling(5,min_periods=1).mean().fillna(x))
print(df.round(2))

    Name  Attempts  Score  Ave3Attempts  Ave5Attempts
0    tom        10      1         10.00          10.0
1    tom        15      5         10.00          10.0
2    tom        14      1         12.50          12.5
3    tom        15      4         13.00          13.0
4    tom        18      1         14.67          13.5
5    tom        15      6         15.67          14.4
6    tom        17      3         16.00          15.4
7    tom        14      7         16.67          15.8
8    tom        16      6         15.33          15.8
9    tom        22      2         15.67          16.0
10  matt        10      1         10.00          10.0
11  matt        15      5         10.00          10.0
12  matt        14      1         12.50          12.5
13  matt        15      4         13.00          13.0
14  matt        18      1         14.67          13.5
15  matt        15      6         15.67          14.4
16  matt        17      3         16.00          15.4
17  matt        14      7         16.67          15.8
18  matt        16      6         15.33          15.8
19  matt        22      2         15.67          16.0

I have then used this set to create a model to predict Score through sklearn train/test using those Ave3Attempts and Ave5Attempts columns.

Now that I have my model I am trying to create a summary table of the most recent data for each person to then look up to predict the score

Essentially trying to produce a new dataframe to then use as part of a new prediction:

      Name    Ave3Attempts  Ave5Attempts
0    tom         17.33          16.8
1    matt        17.33          16.8

Any help on how to do this would be great! thanks!

回答1:

You can use this code:

df2 = pd.DataFrame([], index=df['Name'].unique())
# looping through names
for name in df['Name'].unique():
    df2.loc[name, "Ave3Attempts"] = df[ df['Name']==name ]['Attempts'].tail(3).mean()
    df2.loc[name, "Ave5Attempts"] = df[ df['Name']==name ]['Attempts'].tail(5).mean()
print(df2)

来源：https://stackoverflow.com/questions/62112717/how-to-create-a-summary-table-to-use-in-model-pandas-python

标签

python

pandas