Average values in last n days pandas

眉间皱痕 提交于 2019-12-11 14:38:35

问题


I've got a dataframe of golfers and their golf rounds in various tournaments (see dictionary of df head posted below). I need a fast way of computing, for each round the player plays, his average 'strokes gained' (SG) over the previous n days, where n is any value I decide. I would know how to do this by converting the dataframe into a list of lists and iterating through but that would be very slow. Ideally I want an extra column in the Pandas df titled 'Player's average SG in last 100 days'.

This is what we're working with (dict of dataframe head):

{'Avg SG Player': {0: 0.4564491861877877,
  1: -0.170952417298073,
  2: 1.509033309098962,
  3: -1.7298114700775877,
  4: 1.7856746598995106},
 'Avg Score': {0: 69.53846153846153,
  1: 69.53846153846153,
  2: 69.53846153846153,
  3: 69.53846153846153,
  4: 69.53846153846153},
 'Date': {0: Timestamp('2003-01-23 00:00:00'),
  1: Timestamp('2003-01-23 00:00:00'),
  2: Timestamp('2003-01-23 00:00:00'),
  3: Timestamp('2003-01-23 00:00:00'),
  4: Timestamp('2003-01-23 00:00:00')},
 'Field Strength': {0: 0.08871540761770776,
  1: 0.08871540761770776,
  2: 0.08871540761770776,
  3: 0.08871540761770776,
  4: 0.08871540761770776},
 'Ind': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
 'Overall SG': {0: 7.627176946079241,
  1: 5.627176946079241,
  2: 5.627176946079241,
  3: 4.627176946079241,
  4: 4.627176946079241},
 'Player': {0: 'Harrison Frazar',
  1: 'John Huston',
  2: 'David Toms',
  3: 'James H. McLean',
  4: 'Luke Donald'},
 'Round': {0: 'R1', 1: 'R1', 2: 'R1', 3: 'R1', 4: 'R1'},
 'Rounds Played': {0: 270, 1: 209, 2: 228, 3: 28, 4: 221},
 'SG on Field': {0: 7.538461538461533,
  1: 5.538461538461533,
  2: 5.538461538461533,
  3: 4.538461538461533,
  4: 4.538461538461533},
 'Score': {0: 62, 1: 64, 2: 64, 3: 65, 4: 65},
 'Tourn-Round': {0: '2003 Phoenix OpenR1',
  1: '2003 Phoenix OpenR1',
  2: '2003 Phoenix OpenR1',
  3: '2003 Phoenix OpenR1',
  4: '2003 Phoenix OpenR1'},
 'Tournament': {0: '2003 Phoenix Open',
  1: '2003 Phoenix Open',
  2: '2003 Phoenix Open',
  3: '2003 Phoenix Open',
  4: '2003 Phoenix Open'}}

EDITED

Dataframe is essentially this:

Player-Date of Round-Strokes Gained (on that day)

T Woods - 01-01-2010 - 5.4

R McIlroy - 01-01-2010 - 3.8

T Woods - 02-01-2010 - 0.4

etc.

There are 350,000 rows. What I require is an extra column giving the average strokes gained for the player in question over the n (say 100) days prior to the date of his current round.

So if the next row was:

Player-Date-Strokes Gained (on that day)

T Woods - 20-01-2018 - 3.2

I would want the fourth (new) column, call it '100 Day Average', to be 2.9 ((5.4+0.4)/2) because that is the average of the two previous rounds by Tiger that are in the defined timespan.

Thanks,

Tom


回答1:


This should work:

n = 10000

start_date = pd.to_datetime('today') - pd.Timedelta(n, unit='D')

df[df['Date'] >= start_date].groupby('Player')['Avg SG Player'].mean()

If you want to enter a start date and end date:

start_date = pd.to_datetime('2005-12-01')
end_date = pd.to_datetime('2015-12-01')

df[(df['Date'] >= start_date) & (df['Date'] <= end_date)].groupby('Player')['Avg SG Player'].mean()


来源:https://stackoverflow.com/questions/48775924/average-values-in-last-n-days-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!