Best approach to select most viewed posts from last n hours

不打扰是莪最后的温柔 提交于 2019-12-25 07:34:17

问题


I'm using PHP and MYSQL(innodb engine).

As MYSQL reference says, selecting with comparison of one column and ordering by another can't use our considered index.

I have a table named News.

This table has at least 1 million records with two important columns: time_added and number_of_views.

I need to select most viewed records from last n hours. What is the best index to do this? Or is it possible to run this kind of queries very fast for a table with millions of records?

I've already done this for "last day", meaning I can select most viewed records from last day by adding a new column (date_added). But if I decide to select these records from last week, I'm in trouble again.


回答1:


First, write the query:

select n.*
from news n
where time_added >= date_sub(now(), interval <n> hours)
order by number_of_views desc
limit ??;

The best index is (time_added, number_of_views). Actually, number_of_views won't be used for the full query, but I would include it for other possible queries.




回答2:


First you must add the following line to the my.cnf (in section

[mysqld]):
query_cache_size = 32M (or more).
query_cache_limit = 32M (or more)

query_cache_size Sets size of the cache

Another option, which should pay attention - this query_cache_limit - it sets the maximum amount of the result of the query, which can be placed in the cache. Check the status of the cache, you can request the following:

show global status like 'Qcache%';

http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html




回答3:


You need a summary table. Since 'hour' is your granularity, something like this might work:

CREATE TABLE HourlyViews (
    the_hour DATETIME NOT NULL,
    ct SMALLINT UNSIGNED NOT NULL,
    PRIMARY KEY(the_hour)
) ENGINE=InnoDB;

It might need another column (and add it to the PK) if there is some breakdown of the items you are counting. And you might want some other things SUM'd or COUNT'd in this table.

Build and maintain this table incrementally. That is, every hour, add another row to the table. (Or you could keep it updated with INSERT .. ON DUPLICATE KEY UPDATE ...)

More on Summary Tables

Then change the query to use that table; it will be a lot faster.



来源:https://stackoverflow.com/questions/38163777/best-approach-to-select-most-viewed-posts-from-last-n-hours

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!