How to optimize retrieval of most occurring values (hundreds of millions of rows)

混江龙づ霸主 提交于 2019-12-13 14:40:22

问题


I'm trying to retrieve some most occurring values from a SQLite table containing a few hundreds of millions of rows.

The query so far may look like this:

SELECT value, COUNT(value) AS count FROM table GROUP BY value ORDER BY count DESC LIMIT 10

There is a index on the value field.

However, with the ORDER BY clause, the query takes so much time I've never seen the end of it.

What could be done to drastically improve such queries on such big amount of data?
I tried to add a HAVING clause (e.g: HAVING count > 100000) to lower the number of rows to be sorted, without success.

Note that I don't care much on the time required to do the insertion (it still need to be reasonable, but priority is given to the selection), so I'm opened for solutions suggesting computation at insertion time ...

Thanks in advance,


回答1:


1) create a new table where you'll store one row per unique "value" and the "count", put a descending index on the count column
2) add a trigger to the original table, where you maintain this new table (inset and update) as necessary to increment/decrement the count.
3) run your query off this new table, which will run fast because of the descending count index




回答2:


this query forces you to look at every row in the table. that is what is taking time.

I almost never recommend this, but in this case, you could maintain the count in a denormalized fashion in an external table.

place the value and count into another table during insert, update, and delete via triggers.



来源:https://stackoverflow.com/questions/7334969/how-to-optimize-retrieval-of-most-occurring-values-hundreds-of-millions-of-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!