How to find duplicate based upon multiple columns in a rolling window in pandas?

前端 未结 2 1836
南旧
南旧 2020-12-22 07:23

Sample Data

{\"transaction\": {\"merchant\": \"merchantA\", \"amount\": 20, \"time\": \"2019-02-13T10:00:00.000Z\"}}
{\"transaction\": {\"me         


        
2条回答
  •  离开以前
    2020-12-22 08:10

    First, you could form rolling 120 second blocs of data. You could then apply;

    block and evaluate using duplicated: df = df[df.duplicated(subset=['val1','val2',’val3’], keep=False)]

    Or groupby: df.groupby(['val1','val2',’val3’]).count()

    Or even a SQL distinct. https://www.w3schools.com/sql/sql_distinct.asp

    Please post what you have tried. The above methods work for strings, floats, datetimes and integer data types.

提交回复
热议问题