Pandas - Find and index rows that match row sequence pattern

前端 未结 5 1615
半阙折子戏
半阙折子戏 2020-11-27 21:26

I would like to find a pattern in a dataframe in a categorical variable going down rows. I can see how to use Series.shift() to look up / down and using boolean logic to fi

5条回答
  •  臣服心动
    2020-11-27 22:28

    You could make use of the pd.rolling() methods and then simply compare the arrays that it returns with the array that contains the pattern that you are attempting to match on.

    pattern = np.asarray([1.0, 2.0, 2.0, 0.0])
    n_obs = len(pattern)
    df['rolling_match'] = (df['row_pat']
                           .rolling(window=n_obs , min_periods=n_obs)
                           .apply(lambda x: (x==pattern).all())
                           .astype(bool)             # All as bools
                           .shift(-1 * (n_obs - 1))  # Shift back
                           .fillna(False)            # convert NaNs to False
                           )
    

    It is important to specify the min periods here in order to ensure that you only find exact matches (and so the equality check won't fail when the shapes are misaligned). The apply function is doing a pairwise check between the two arrays, and then we use the .all() to ensure all match. We convert to a bool, and then call shift on the function to move it to being a 'forward looking' indicator instead of only occurring after the fact.

    Help on the rolling functionality available here - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html

提交回复
热议问题