How to efficiently compare rows in a pandas DataFrame?

前端 未结 4 1885
自闭症患者
自闭症患者 2021-02-10 11:28

I have a pandas dataframe containing a record of lightning strikes with timestamps and global positions in the following format:

Index      Date      Time                


        
4条回答
  •  南旧
    南旧 (楼主)
    2021-02-10 11:56

    Depending on your data, this might be useful or not. Some strikes may be "isolated" in time, i.e. further away from the strike before and the strike after than the time-threshold. You could use these strikes to separate your data into groups, and you can then process those groups using searchsorted along the lines suggested by ysearka. If your data ends up separated into hundreds of groups, it might save time.

    Here is how the code would look like:

    # first of all, convert to timestamp
    df['DateTime'] = pd.to_datetime(df['Date'].astype(str) + 'T' + df['Time'])
    
    # calculate the time difference with previous and following strike
    df['time_separation'] = np.minimum( df['DateTime'].diff().values, 
                                       -df['DateTime'].diff(-1).values)
    # using a specific threshold for illustration
    df['is_isolated'] = df['time_separation'] > "00:00:00.08"
    # define groups
    df['group'] = (df['is_isolated'] != df['is_isolated'].shift()).cumsum()
    # put isolated strikes into a separate group so they can be skipped
    df.loc[df['is_isolated'], 'group'] = -1
    

    Here is the output, with the specific threshold I used:

           Lat      Lon                      DateTime is_isolated  group
    0  -7.1961 -60.7604 2016-01-01 00:00:00.996269200       False      1
    1  -7.0518 -60.6911 2016-01-01 00:00:01.064620700       False      1
    2 -25.3913 -57.2922 2016-01-01 00:00:01.110206600       False      1
    3  -7.4842 -60.5129 2016-01-01 00:00:01.201857300        True     -1
    4  -7.3939 -60.4992 2016-01-01 00:00:01.294275000        True     -1
    5  -9.6386 -62.8448 2016-01-01 00:00:01.443149300       False      3
    6 -23.7089 -58.8888 2016-01-01 00:00:01.522615700       False      3
    7  -6.3513 -55.6545 2016-01-01 00:00:01.593241200       False      3
    8 -23.8019 -58.9382 2016-01-01 00:00:01.673635000       False      3
    9 -24.5724 -57.7229 2016-01-01 00:00:01.695785800       False      3
    

提交回复
热议问题