How to index and query a very large DB with 60M rows and 50 columns

南笙酒味 提交于 2020-02-06 10:05:06

问题


I have a big table with 60M rows and 50 columns (columns include "company_idx" and "timestamp"). Thus, when I do my simple SQL Query such as:

SELECT * FROM companies_Scores.Scores 
WHERE `company_idx`=11 
  AND `timestamp` BETWEEN  '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'

It takes basically 4 minutes to run (which is way too long). Thus, I thought about indexing my table, so I've done:

CREATE INDEX idx_time ON companies_Scores.Scores(company_idx, timestamp) USING BTREE;

However, when I now do the following, it takes also 4 minutes to run.

SELECT * FROM companies_Scores.Scores 
USE INDEX(idx_time) 
WHERE `company_idx`=11 
  AND `timestamp` BETWEEN  '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'

I'm really a beginner with SQL and indexes. So I'm not really sure how to use indexes in a query. I guess the one I've done above is correct? Why does it take so much time? How can I improve it? I'd like my queries for each company_idx to be as quick as possible.

When I run EXPLAIN, I get:

[{'Cardinality': 115751,
  'Collation': 'A',
  'Column_name': 'company_idx',
  'Comment': '',
  'Index_comment': '',
  'Index_type': 'BTREE',
  'Key_name': 'idx_time',
  'Non_unique': 1,
  'Null': 'YES',
  'Packed': None,
  'Seq_in_index': 1,
  'Sub_part': None,
  'Table': 'Scores'},
 {'Cardinality': 45831976,
  'Collation': 'A',
  'Column_name': 'timestamp',
  'Comment': '',
  'Index_comment': '',
  'Index_type': 'BTREE',
  'Key_name': 'idx_time',
  'Non_unique': 1,
  'Null': 'YES',
  'Packed': None,
  'Seq_in_index': 2,
  'Sub_part': None,
  'Table': 'Scores'}]

回答1:


Your index looks correct for the query. You are forcing index usage, so we can assume the index is being used, if possible.

One issue may be that the index cannot be used. That would occur if you have type problems with the columns. For instance, the comparison value 11 is a number. If customer_idx is a string, you have a problem. The comparison should be a string -- '11'.

Another issue is simply that there might be a lot of data. If even a few percent of the rows match the conditions, the index is not going to help. One major use of indexes is for "needle-in-the-haystack" queries. They help finding a small subset. They don't help if you need all or much of the haystack.



来源:https://stackoverflow.com/questions/58224677/how-to-index-and-query-a-very-large-db-with-60m-rows-and-50-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!