How to optimize retrieval of most occurring values (hundreds of millions of rows)

问题

I'm trying to retrieve some most occurring values from a SQLite table containing a few hundreds of millions of rows.

The query so far may look like this:

SELECT value, COUNT(value) AS count FROM table GROUP BY value ORDER BY count DESC LIMIT 10

There is a index on the value field.

However, with the ORDER BY clause, the query takes so much time I've never seen the end of it.

What could be done to drastically improve such queries on such big amount of data?
I tried to add a HAVING clause (e.g: HAVING count > 100000) to lower the number of rows to be sorted, without success.

Note that I don't care much on the time required to do the insertion (it still need to be reasonable, but priority is given to the selection), so I'm opened for solutions suggesting computation at insertion time ...

Thanks in advance,

回答1:

1) create a new table where you'll store one row per unique "value" and the "count", put a descending index on the count column
2) add a trigger to the original table, where you maintain this new table (inset and update) as necessary to increment/decrement the count.
3) run your query off this new table, which will run fast because of the descending count index

回答2:

this query forces you to look at every row in the table. that is what is taking time.

I almost never recommend this, but in this case, you could maintain the count in a denormalized fashion in an external table.

place the value and count into another table during insert, update, and delete via triggers.

来源：https://stackoverflow.com/questions/7334969/how-to-optimize-retrieval-of-most-occurring-values-hundreds-of-millions-of-rows

标签

java

sql

sqlite

optimization

query-optimization