Find entries of 20 or more by contact within one minute of each entry

后端未结

关注

 2  1693

终归单人心 2021-01-29 12:57

We are collecting some analytics data for contacts & each page they visit. A lot of the analytics data is from malicious attacks or bots, so they are hitting like 20+ pages

2条回答

耶瑟儿～ (楼主)

2021-01-29 13:23
For data like:
```
n, d
John, 2020-01-01 00:00:10
John, 2020-01-01 00:00:30
John, 2020-01-01 00:00:50
John, 2020-01-01 00:01:10
John, 2020-01-01 00:01:30
John, 2020-01-01 00:01:50
```
You could group on raw minute precision of the date; it might be sufficient:
```
SELECT n, DATEDIFF(minute, CAST(d as date), d) 
FROM t
GROUP BY n, DATEDIFF(minute, CAST(d as date), d) 
HAVING COUNT(*) > 20
```
Of course, you might get someone who blats 20 requests across the minute boundary so that half fall in each minute. You could counter for that by adding 30 seconds to all their times and unioning the two queries

There are other things you could do such as a coordinated query that looks back over the past minute to find how many rows were within the same minute sliding window:
```
SELECT 
  n, 
  (SELECT COUNT(*) FROM t tI WHERE tI.n = tO.n AND tI.d BETWEEN DATEADD(minute, -1, tO.d) AND tO.d) ct
FROM 
  t tO
```
This resultset could then be queried for a GROUP BY n HAVING MAX(ct) > 20..

Footnote: it's a shame SQLS doesn't support ranging on dates in its window functions like Oracle does; COUNT(*) OVER(PARTITION BY n ORDER BY d RANGE BETWEEN INTERVAL 1 MINUTE PRECEDING AND 0) - SQLS understands range but only for "rows preceding/following/both that have the same value as the current row" and I don't believe there's a way to adjust your continuously variable datetime so that this applies
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...