Rolling PCA on pandas dataframe

时光毁灭记忆、已成空白 提交于 2021-02-08 06:59:24

问题


I'm wondering if anyone knows of how to implement a rolling/moving window PCA on a pandas dataframe. I've looked around and found implementations in R and MATLAB but not Python. Any help would be appreciated!

This is not a duplicate - moving window PCA is not the same as PCA on the entire dataframe. Please see pandas.DataFrame.rolling() if you do not understand the difference


回答1:


Unfortunately, pandas.DataFrame.rolling() seems to flatten the df before rolling, so it cannot be used as one might expect to roll over the rows of the df and pass windows of rows to the PCA.

The following is a work-around for this based on rolling over indices instead of rows. It may not be very elegant but it works:

# Generate some data (1000 time points, 10 features)
data = np.random.random(size=(1000,10))
df = pd.DataFrame(data)

# Set the window size
window = 100

# Initialize an empty df of appropriate size for the output
df_pca = pd.DataFrame( np.zeros((data.shape[0] - window + 1, data.shape[1])) )

# Define PCA fit-transform function
# Note: Instead of attempting to return the result, 
#       it is written into the previously created output array.
def rolling_pca(window_data):
    pca = PCA()
    transf = pca.fit_transform(df.iloc[window_data])
    df_pca.iloc[int(window_data[0])] = transf[0,:]
    return True

# Create a df containing row indices for the workaround
df_idx = pd.DataFrame(np.arange(df.shape[0]))

# Use `rolling` to apply the PCA function
_ = df_idx.rolling(window).apply(rolling_pca)

# The results are now contained here:
print df_pca

A quick check reveals that the values produced by this are identical to control values computed by slicing appropriate windows manually and running PCA on them.



来源:https://stackoverflow.com/questions/45928761/rolling-pca-on-pandas-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!