Rolling Mean on pandas on a specific column

匿名 (未验证) 提交于 2019-12-03 02:06:01

问题:

I have a data frame like this which is imported from a CSV.

              stock  pop Date 2016-01-04  325.316   82 2016-01-11  320.036   83 2016-01-18  299.169   79 2016-01-25  296.579   84 2016-02-01  295.334   82 2016-02-08  309.777   81 2016-02-15  317.397   75 2016-02-22  328.005   80 2016-02-29  315.504   81 2016-03-07  328.802   81 2016-03-14  339.559   86 2016-03-21  352.160   82 2016-03-28  348.773   84 2016-04-04  346.482   83 2016-04-11  346.980   80 2016-04-18  357.140   75 2016-04-25  357.439   77 2016-05-02  356.443   78 2016-05-09  365.158   78 2016-05-16  352.160   72 2016-05-23  344.540   74 2016-05-30  354.998   81 2016-06-06  347.428   77 2016-06-13  341.053   78 2016-06-20  363.515   80 2016-06-27  349.669   80 2016-07-04  371.583   82 2016-07-11  358.335   81 2016-07-18  362.021   79 2016-07-25  368.844   77 ...             ...  ... 

I wanted to add a new column MA which calculates Rolling mean for the column pop. I tried the following

df['MA']=data.rolling(5,on='pop').mean() 

I get an error

ValueError: Wrong number of items passed 2, placement implies 1 

So I thought let me try if it just works without adding a column. I used

 data.rolling(5,on='pop').mean() 

I got the output

               stock  pop Date 2016-01-04       NaN   82 2016-01-11       NaN   83 2016-01-18       NaN   79 2016-01-25       NaN   84 2016-02-01  307.2868   82 2016-02-08  304.1790   81 2016-02-15  303.6512   75 2016-02-22  309.4184   80 2016-02-29  313.2034   81 2016-03-07  319.8970   81 2016-03-14  325.8534   86 2016-03-21  332.8060   82 2016-03-28  336.9596   84 2016-04-04  343.1552   83 2016-04-11  346.7908   80 2016-04-18  350.3070   75 2016-04-25  351.3628   77 2016-05-02  352.8968   78 2016-05-09  356.6320   78 2016-05-16  357.6680   72 2016-05-23  355.1480   74 2016-05-30  354.6598   81 2016-06-06  352.8568   77 2016-06-13  348.0358   78 2016-06-20  350.3068   80 2016-06-27  351.3326   80 2016-07-04  354.6496   82 2016-07-11  356.8310   81 2016-07-18  361.0246   79 2016-07-25  362.0904   77 ...              ...  ... 

I can't seem to apply Rolling mean on the column pop. What am I doing wrong?

回答1:

To assign a column, you can create a rolling object based on your Series:

df['new_col'] = data['column'].rolling(5).mean() 

The answer posted by ac2001 is not the most performant way of doing this. He is calculating a rolling mean on every column in the dataframe, then he is assigning the "ma" column using the "pop" column. The first method of the following is much more efficient:

I would not recommend using the second method unless you need to store computed rolling means on all other columns.



回答2:

This solution worked for me.

data['MA'] = data.rolling(5).mean()['pop'] 

I think the issue may be that the on='pop' is just changing the column to perform the rolling window from the index.

From the doc string: " For a DataFrame, column on which to calculate the rolling window, rather than the index"



回答3:

Edit: pd.rolling_mean is deprecated in pandas and will be removed in future. Instead: Using pd.rolling you can do:

df['MA'] = df['pop'].rolling(window=5,center=False).mean() 

for a dataframe df:

          Date    stock  pop 0   2016-01-04  325.316   82 1   2016-01-11  320.036   83 2   2016-01-18  299.169   79 3   2016-01-25  296.579   84 4   2016-02-01  295.334   82 5   2016-02-08  309.777   81 6   2016-02-15  317.397   75 7   2016-02-22  328.005   80 8   2016-02-29  315.504   81 9   2016-03-07  328.802   81 

To get:

          Date    stock  pop    MA 0   2016-01-04  325.316   82   NaN 1   2016-01-11  320.036   83   NaN 2   2016-01-18  299.169   79   NaN 3   2016-01-25  296.579   84   NaN 4   2016-02-01  295.334   82  82.0 5   2016-02-08  309.777   81  81.8 6   2016-02-15  317.397   75  80.2 7   2016-02-22  328.005   80  80.4 8   2016-02-29  315.504   81  79.8 9   2016-03-07  328.802   81  79.6 

Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html

Old: Although it is deprecated you can use:

df['MA']=pd.rolling_mean(df['pop'], window=5) 

to get:

          Date    stock  pop    MA 0   2016-01-04  325.316   82   NaN 1   2016-01-11  320.036   83   NaN 2   2016-01-18  299.169   79   NaN 3   2016-01-25  296.579   84   NaN 4   2016-02-01  295.334   82  82.0 5   2016-02-08  309.777   81  81.8 6   2016-02-15  317.397   75  80.2 7   2016-02-22  328.005   80  80.4 8   2016-02-29  315.504   81  79.8 9   2016-03-07  328.802   81  79.6 

Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!