MySQL Hash Indexes for Optimization

和自甴很熟 提交于 2019-12-04 05:12:04

Firstly, to deal with the specific questions that you raise:

  1. I currently have indexes on both ID, Lookup sperately, and as one index, and it is a B-Tree index. Based on my WHERE Clause, does a hash index fit for as an optimization technique??

    As documented under CREATE INDEX Syntax:

    +----------------+--------------------------------+
    | Storage Engine |    Permissible Index Types     |
    +----------------+--------------------------------+
    | MyISAM         | BTREE                          |
    | InnoDB         | BTREE                          |
    | MEMORY/HEAP    | HASH, BTREE                    |
    | NDB            | BTREE, HASH (see note in text) |
    +----------------+--------------------------------+
    

    Therefore, before even considering HASH indexing, one should be aware that it is only available in the MEMORY and NDB storage engines: so may not even be an option to you.

    Furthermore, be aware that indexes on combinations of ID and Lookup alone may not be optimal, as your WHERE predicate also filters on tablea.Elg_IDpart1 and tableb.IDpart1—you may benefit from indexing on those columns too.

  2. Can I have a single hash index, and the rest of the indexes b B-tree index?

    Provided that the desired index types are supported by the storage engine, you can mix them as you see fit.

  3. instead of using the ID, Lookup, I would like to force it to use both, or at least turn it into a different kind of index and see if that helps?

    You could use an index hint to force MySQL to use different indexes to those that the optimiser would otherwise have selected.

  4. Now I know MySQL is smart enough to determine which index is most appropriate, so is that what it's doing?

    It is usually smart enough, but not always. In this case, however, it has probably determined that the cardinality of the indexes is such that it is better to use those that it has chosen.


Now, depending on the version of MySQL that you are using, tables derived from subqueries may not have any indexes upon them that can be used for further processing: consequently the join with b may require a full scan of that derived table (there's insufficient information in your question to determine exactly how much of a problem this might be, but schema1.tableb having 1.5 million records suggests it could be a significant factor).

See Subquery Optimization for more information.

One should therefore try to avoid using derived tables if at all possible. In this case, there does not appear to be any purpose to your derived table as one could simply join schema1.tablea and schema1.tableb directly:

UPDATE   schema1.tablea a
    JOIN schema1.tableb b USING (ID, Lookup)
SET      a.Elg_IDpart1 = b.IDpart1, 
         a.Elg_IDpart2 = b.IDpart2
WHERE    a.Elg_IDpart1 IS     NULL
     AND a.ID          IS NOT NULL
     AND b.IDpart1     IS NOT NULL
     AND b.Lookup      IS NOT NULL
ORDER BY ID, Lookup

The only thing that has been lost is the filter for DISTINCT records, but duplicate records will simply (attempt to) overwrite updated values with those same values again—which will have no effect, but may have proved very costly (especially with so many records in that table).

The use of ORDER BY in the derived table was pointless as it could not be relied upon to achieve any particular order to the UPDATE, whereas in this revised version it will ensure that any updates that overwrite previous ones take place in the specified order: but is that necessary? Perhaps it can be removed and save on any sorting operation.

One should check the predicates in the WHERE clause: are they all necessary (the NOT NULL checks on a.ID and b.Lookup, for example, are superfluous given that any such NULL records will be eliminated by the JOIN predicate)?

Altogether, this leaves us with:

UPDATE   schema1.tablea a
    JOIN schema1.tableb b USING (ID, Lookup)
SET      a.Elg_IDpart1 = b.IDpart1, 
         a.Elg_IDpart2 = b.IDpart2
WHERE    a.Elg_IDpart1 IS     NULL
     AND b.IDpart1     IS NOT NULL

Only if performance is still unsatisfactory should one look further at the indexing. Are relevant columns (i.e. those used in the JOIN and WHERE predicates) indexed? Are the indexes being selected for use by MySQL (bear in mind that it can only use one index per table for lookups: for testing both the JOIN predicate and the filter predicates: perhaps you need an appropriate composite index)? Check the query execution plan by using EXPLAIN to investigate such issues further.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!