Pandas - Find and index rows that match row sequence pattern

前端 未结 5 1618
半阙折子戏
半阙折子戏 2020-11-27 21:26

I would like to find a pattern in a dataframe in a categorical variable going down rows. I can see how to use Series.shift() to look up / down and using boolean logic to fi

5条回答
  •  我在风中等你
    2020-11-27 22:19

    Expanding on Emmet02's answer: using the rolling function for all groups and setting match-column to 1 for all matching pattern indices:

    pattern = np.asarray([1,2,2,0])
    
    # Create a match column in the main dataframe
    df.assign(match=False, inplace=True)
    
    for group_var, group in df.groupby("group_var"):
    
        # Per group do rolling window matching, the last 
        # values of matching patterns in array 'match'
        # will be True
        match = (
            group['row_pat']
            .rolling(window=len(pattern), min_periods=len(pattern))
            .apply(lambda x: (x==pattern).all())
        )
    
        # Get indices of matches in current group
        idx = np.arange(len(group))[match == True]
    
        # Include all indices of matching pattern, 
        # counting back from last index in pattern
        idx = idx.repeat(len(pattern)) - np.tile(np.arange(len(pattern)), len(idx))
    
        # Update matches
        match.values[idx] = True
        df.loc[group.index, 'match'] = match
    
    df[df.match==True]
    

    edit: Without a for loop

    # Do rolling matching in group clause
    match = (
        df.groupby("group_var")
        .rolling(len(pattern))
        .row_pat.apply(lambda x: (x==pattern).all())
    )
    
    # Convert NaNs
    match = (~match.isnull() & match)
    
    # Get indices of matches in current group
    idx = np.arange(len(df))[match]
    # Include all indices of matching pattern
    idx = idx.repeat(len(pattern)) - np.tile(np.arange(len(pattern)), len(idx))
    
    # Mark all indices that are selected by "idx" in match-column
    df = df.assign(match=df.index.isin(df.index[idx]))
    

提交回复
热议问题