问题
I am working on the code below:
# Resample, interpolate and inspect ozone data here
data = data.resample('D').interpolate()
data.info()
# Create the rolling window
***rolling = data.rolling(360)['Ozone']
# Insert the rolling quantiles to the monthly returns
data['q10'] = rolling.quantile(.1)
data['q50'] = rolling.quantile(.5)
data['q90'] = rolling.quantile(.9)
# Plot the data
data.plot()
plt.show()
For the starred line (***), I was wondering, can I use the following instead?
data['Ozone'].rolling(360)
Why is the following expression False?
data.rolling(360)['Ozone']==data['Ozone'].rolling(360)
What are their differences?
回答1:
data.rolling(360)['Ozone']&data['Ozone'].rolling(360)can be used interchangeably, but they should be compared after using an aggregation method, such as.mean, andpandas.DataFrame.equalshould be used to make the comparison..rollingmethods require awindow, or number of observations used for the calculation. The values in thewindow,10in the example below, are filled withNaN.- pandas.DataFrame.rolling
- pandas.Series.rolling
df.rolling(10)['A'])&df['A'].rolling(10)are apandas.core.window.rolling.Rollingtype, which won't compare.- See the documentation and How do pandas Rolling objects work? for more details about how
.rollingworks.
- See the documentation and How do pandas Rolling objects work? for more details about how
- Pandas: Window - functions
import pandas as pd
import numpy as np
# test data and dataframe
np.random.seed(10)
df = pd.DataFrame(np.random.randint(20, size=(20, 1)), columns=['A'])
# this is pandas.DataFrame.rolling with a column selection
df.rolling(10)['A']
[out]:
Rolling [window=10,center=False,axis=0]
# this is pandas.Series.rolling
df['A'].rolling(10)
[out]:
Rolling [window=10,center=False,axis=0]
# see that the type is the same, pandas.core.window.rolling.Rolling
type(df.rolling(10)['A']) == type(df['A'].rolling(10))
[out]:
True
# the two implementations evaluate as False, when compared
df.rolling(10)['A'] == df['A'].rolling(10)
[out]:
False
- The objects can be compared once an aggregation method is used.
- Aggregating
.mean, we can see the values used for thewindowareNaN.
- Aggregating
df.rolling(10)['A'].mean()&df['A'].rolling(10).mean()are bothpandas.core.series.Seriestype, which can be compared.
df.rolling(10)['A'].mean()
[out]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 12.3
10 12.2
11 12.1
12 12.3
13 11.1
14 12.1
15 12.3
16 12.3
17 12.0
18 11.5
19 11.9
Name: A, dtype: float64
df['A'].rolling(10).mean()
[out]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 12.3
10 12.2
11 12.1
12 12.3
13 11.1
14 12.1
15 12.3
16 12.3
17 12.0
18 11.5
19 11.9
Name: A, dtype: float64
- They do not evaluate the same because
np.nan == np.nanisFalse. Essentially, they are the same, but when comparing the two with==, the rows withNaNevaluate asFalse. - Using pandas.DataFrame.equals however, treats NaNs in the same location as equal.
# row by row evaluation
df.rolling(10)['A'].mean() == df['A'].rolling(10).mean()
[out]:
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 True
19 True
Name: A, dtype: bool
# overall comparison
all(df.rolling(10)['A'].mean() == df['A'].rolling(10).mean())
[out]:
False
# using pandas.DataFrame.equals
df.rolling(10)['A'].mean().equals(df['A'].rolling(10).mean())
[out]:
True
来源:https://stackoverflow.com/questions/63508736/how-to-use-rolling-in-pandas