Please identify this algorithm: probabilistic top-k elements in a data stream

前端未结

关注

 3  924

你的背包 2021-02-02 01:47

I remember hearing about the following algorithm some years back, but can\'t find any reference to it online. It identifies the top k elements (or heavy hitters) in a dat

3条回答

忘了有多久 (楼主)

2021-02-02 02:41

You may be looking for the "Frequent" algorithm. It uses k - 1 counters to find all elements that exceed 1/k of the total, and was published in 1982 by Misra and Gries. It's a generalization of Boyer and Moore's (or Fischer-Salzberg's) "Majority" algorithm, where k is 2. These and related algorithms are introduced in a helpful article, "The Britney Spears Problem."

I give a detailed explanation of the algorithm elsewhere on StackOverflow, which I won't repeat here. The important point is that, after one pass, the counter values don't precisely indicate the frequency of an item; they can under-count by a margin that depends on the length of the stream and inversely on the number of counters (n / k). All of these algorithms (including Metwally's "SpaceSaving") require a second pass if you want an exact count rather than an estimate of frequency.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...