How to find duplicate based upon multiple columns in a rolling window in pandas?

前端 未结 2 1843
南旧
南旧 2020-12-22 07:23

Sample Data

{\"transaction\": {\"merchant\": \"merchantA\", \"amount\": 20, \"time\": \"2019-02-13T10:00:00.000Z\"}}
{\"transaction\": {\"me         


        
2条回答
  •  误落风尘
    2020-12-22 08:14

    So i made it work but not with rolling windows as it doesn't support string type. the feature is reported and requested on Pandas Repo as well.

    My solution snippet to the problem:

        if len(df.index) > 0:
            res = df.loc[(df.merchant == data['transaction']['merchant']) & (df.amount == data['transaction']['amount'])]
            res['timediff'] = (data['transaction']['time'] - res['time']).dt.total_seconds().abs() <= 120
            if res.timediff.any():
                continue
        df = df.append(df1)
    print(df)
    

    Sample data:

    {"transaction": {"merchant": "merchantA", "amount": 20, "time": "2019-02-13T10:00:00.000Z"}}
    {"transaction": {"merchant": "merchantB", "amount": 90, "time": "2019-02-13T11:00:01.000Z"}}
    {"transaction": {"merchant": "merchantC", "amount": 10, "time": "2019-02-13T11:00:10.000Z"}}
    {"transaction": {"merchant": "merchantD", "amount": 10, "time": "2019-02-13T11:00:20.000Z"}}
    {"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:01:30.000Z"}}
    {"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:03:00.000Z"}}
    {"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:02:00.000Z"}}
    {"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:02:20.000Z"}}
    {"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:02:30.000Z"}}
    {"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:05:20.000Z"}}
    {"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:00:30.000Z"}}
    

    Output:

                          merchant  amount                time
    2019-02-13 10:00:00  merchantA      20 2019-02-13 10:00:00
    2019-02-13 11:00:01  merchantB      90 2019-02-13 11:00:01
    2019-02-13 11:00:10  merchantC      10 2019-02-13 11:00:10
    2019-02-13 11:00:20  merchantD      10 2019-02-13 11:00:20
    2019-02-13 11:01:30  merchantE      10 2019-02-13 11:01:30
    2019-02-13 11:03:00  merchantF      10 2019-02-13 11:03:00
    2019-02-13 11:05:20  merchantF      10 2019-02-13 11:05:20
    

提交回复
热议问题