is my large mysql table destined for failure?

戏子无情 提交于 2019-12-13 19:17:09

问题


I have built a mysql table on my local computer to store stock market data. The table name is minute_data, and the structure is simple enough:

You can see that I made the key column a combination of date and symbol -> concat(date,symbol). This way I do an insert ignore ... query to add data to the table without duplicating a date/symbol combination.

With this table, data retrieval is very simple. Say I wanted to get all the data for the symbol CSCO, then I could simply do this query:

select * from minute_data where symbol = "CSCO" order by date;

Everything has been "working". The table now has data from over 1000 symbols, with over 22 million rows already. I am thinking that is is not even half full for all the 1000 symbols yet, so I am expecting to keep growing the size of the table.

I am starting to see serious performance problems when querying this table. For example the following query (which I often want to do, to see the latest date for a particular symbol) takes well over 1 minute to complete, and only returns 1 row!

select * from minute_data where symbol = "CSCO" order by date desc limit 1;  

This query (which is also very import) is also taking over 1 minute on average:

select count(*), symbol from minute_data group by symbol;

The performance problems are making it unrealistic to keep working with the data in this way. These are the questions that I would like to ask the community:

Is it futile to continue building my data set into this table?

Is MySQL a bad choice altogether for a data set like this?

What can I do to this table to improve performance?

What kind of data structure should I use for this purpose (instead of a MySQL table)?

Thank You!

UPDATE

I am providing the output from the explain, the same for the following 2 queries:

explain select count(*), symbol from minute_data group by symbol;
explain select * from minute_data  where symbol = "CSCO" order by date desc limit 1;

UPDATE 2

pretty simple fix. I performed this query to remove the useless key_col that I had defined above, and made a primary key on 2 columns: date and symbol:

alter table minute_data drop primary key, add primary key (date,symbol);

Now I tried the following query, and it finished in less than 1 second:

select * from minute_data  where symbol = "CSCO" order by date desc limit 1;

This query still takes a long time to complete (72 seconds). I guess that's still because the query has to tabulate all 22 million rows in one query?:

select count(*), symbol from minute_data group by symbol;

回答1:


Your key_col is completely useless. You know that you can have a primary key over multiple columns? I'd recommend, that you drop that column and create a new primary key on (date, symbol) in this order since your date column has the higher cardinality. Additionally you can then (if there's the need for it) create another unique index on (symbol, date). Post EXPLAINs of your most important queries. And what's the cardinality of symbol?

UPDATE:

What you can see in the explain is, that there's no index which can be used and it scans the whole 22.5 million rows. Please have a try with the above mentioned. If you don't want to drop the key_col right now, you should at least add an index on symbol column.



来源:https://stackoverflow.com/questions/15612361/is-my-large-mysql-table-destined-for-failure

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!