optimize mysql count query

后端 未结 10 682
忘了有多久
忘了有多久 2020-12-30 05:54

Is there a way to optimize this further or should I just be satisfied that it takes 9 seconds to count 11M rows ?

devuser@xcmst > mysql --user=user --pass         


        
相关标签:
10条回答
  • 2020-12-30 06:18

    You should add an index on the 'date_updated' field.

    Another thing you can do if you don't mind changing the structure of the table, is to use the timestamp of the date in 'int' instead of 'datetime' format, and it might be even faster. If you decide to do so, the query will be

    select count(date_updated) from record_updates where date_updated > 1291911807
    
    0 讨论(0)
  • 2020-12-30 06:19

    MySQL doesn't "optimize" count(*) queries in InnoDB because of versioning. Every item in the index has to be iterated over and checked to make sure that the version is correct for display (e.g., not an open commit). Since any of your data can be modified across the database, ranged selects and caching won't work. However, you possibly can get by using triggers. There are two methods to this madness.

    This first method risks slowing down your transactions since none of them can truly run in parallel: use after insert and after delete triggers to increment / decrement a counter table. Second trick: use those insert / delete triggers to call a stored procedure which feeds into an external program which similarly adjusts values up and down, or acts upon a non-transactional table. Beware that in the event of a rollback, this will result in inaccurate numbers.

    If you don't need an exact numbers, check out this query:

    select table_rows from information_schema.tables
    where table_name = 'foo';
    

    Example difference: count(*): 1876668, table_rows: 1899004. The table_rows value is an estimation, and you'll get a different number every time even if you database doesn't change.

    For my own curiosity: do you need exact numbers that are updated every second? IF so, why?

    0 讨论(0)
  • 2020-12-30 06:23

    Since >'2009-10-11 15:33:22' contains most of the records,
    I would suggest to do a reverse matching like <'2009-10-11 15:33:22' (mysql work less harder and less rows involved)

    select 
      TABLE_ROWS -
      (select count(*) from record_updates where add_date<"2009-10-11 15:33:22") 
    from information_schema.tables 
    where table_schema = "marctoxctransformation" and table_name="record_updates"
    

    You can combine with programming language (like bash shell)
    to make this calculation a bit smarter...
    such as do execution plan first to calculate which comparison will use lesser row

    From my testing (around 10M records), the normal comparison takes around 3s,
    and now cut-down to around 0.25s

    0 讨论(0)
  • 2020-12-30 06:26

    There are a few details I'd like you to clarify (would put into comments on the q, but it is actually easier to remove from here when you update your question).

    1. What is the intended usage of data, insert once and get the counts many times, or your inserts and selects are approx on par?
    2. Do you care about insert/update performance?
    3. What is the engine used for the table? (heck you can do SHOW CREATE TABLE ...)
    4. Do you need the counts to be exact or approximately exact (like 0.1% correct)
    5. Can you use triggers, summary tables, change schema, change RDBMS, etc.. or just add/remove indexes?
    6. Maybe you should explain also what is this table supposed to be? You have record_id with cardinality that matches the number of rows, so is it PK or FK or what is it? Also the cardinality of the date_updated suggests (though not necessarily correct) that it has same values for ~5,000 records on average), so what is that? - it is ok to ask a SQL tuning question with not context, but it is also nice to have some context - especially if redesigning is an option.

    In the meantime, I'll suggest you to get this tuning script and check the recommendations it will give you (it's just a general tuning script - but it will inspect your data and stats).

    0 讨论(0)
提交回复
热议问题