We are collecting some analytics data for contacts & each page they visit. A lot of the analytics data is from malicious attacks or bots, so they are hitting like 20+ pages
For data like:
n, d
John, 2020-01-01 00:00:10
John, 2020-01-01 00:00:30
John, 2020-01-01 00:00:50
John, 2020-01-01 00:01:10
John, 2020-01-01 00:01:30
John, 2020-01-01 00:01:50
You could group on raw minute precision of the date; it might be sufficient:
SELECT n, DATEDIFF(minute, CAST(d as date), d)
FROM t
GROUP BY n, DATEDIFF(minute, CAST(d as date), d)
HAVING COUNT(*) > 20
Of course, you might get someone who blats 20 requests across the minute boundary so that half fall in each minute. You could counter for that by adding 30 seconds to all their times and unioning the two queries
There are other things you could do such as a coordinated query that looks back over the past minute to find how many rows were within the same minute sliding window:
SELECT
n,
(SELECT COUNT(*) FROM t tI WHERE tI.n = tO.n AND tI.d BETWEEN DATEADD(minute, -1, tO.d) AND tO.d) ct
FROM
t tO
This resultset could then be queried for a GROUP BY n HAVING MAX(ct) > 20
..
Footnote: it's a shame SQLS doesn't support ranging on dates in its window functions like Oracle does; COUNT(*) OVER(PARTITION BY n ORDER BY d RANGE BETWEEN INTERVAL 1 MINUTE PRECEDING AND 0)
- SQLS understands range but only for "rows preceding/following/both that have the same value as the current row" and I don't believe there's a way to adjust your continuously variable datetime so that this applies