Sample Data
{\"transaction\": {\"merchant\": \"merchantA\", \"amount\": 20, \"time\": \"2019-02-13T10:00:00.000Z\"}}
{\"transaction\": {\"me
First, you could form rolling 120 second blocs of data. You could then apply;
block and evaluate using duplicated: df = df[df.duplicated(subset=['val1','val2',’val3’], keep=False)]
Or groupby: df.groupby(['val1','val2',’val3’]).count()
Or even a SQL distinct. https://www.w3schools.com/sql/sql_distinct.asp
Please post what you have tried. The above methods work for strings, floats, datetimes and integer data types.
So i made it work but not with rolling windows as it doesn't support string type. the feature is reported and requested on Pandas Repo as well.
My solution snippet to the problem:
if len(df.index) > 0:
res = df.loc[(df.merchant == data['transaction']['merchant']) & (df.amount == data['transaction']['amount'])]
res['timediff'] = (data['transaction']['time'] - res['time']).dt.total_seconds().abs() <= 120
if res.timediff.any():
continue
df = df.append(df1)
print(df)
Sample data:
{"transaction": {"merchant": "merchantA", "amount": 20, "time": "2019-02-13T10:00:00.000Z"}}
{"transaction": {"merchant": "merchantB", "amount": 90, "time": "2019-02-13T11:00:01.000Z"}}
{"transaction": {"merchant": "merchantC", "amount": 10, "time": "2019-02-13T11:00:10.000Z"}}
{"transaction": {"merchant": "merchantD", "amount": 10, "time": "2019-02-13T11:00:20.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:01:30.000Z"}}
{"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:03:00.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:02:00.000Z"}}
{"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:02:20.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:02:30.000Z"}}
{"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:05:20.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:00:30.000Z"}}
Output:
merchant amount time
2019-02-13 10:00:00 merchantA 20 2019-02-13 10:00:00
2019-02-13 11:00:01 merchantB 90 2019-02-13 11:00:01
2019-02-13 11:00:10 merchantC 10 2019-02-13 11:00:10
2019-02-13 11:00:20 merchantD 10 2019-02-13 11:00:20
2019-02-13 11:01:30 merchantE 10 2019-02-13 11:01:30
2019-02-13 11:03:00 merchantF 10 2019-02-13 11:03:00
2019-02-13 11:05:20 merchantF 10 2019-02-13 11:05:20