MySQL Join based on YEAR () indexing - Column add or Generated Column

一世执手 提交于 2020-01-23 12:08:10

问题


Based on the answer https://stackoverflow.com/a/1601812/4050261

I am using SQL query as below

FROM workdone
LEFT JOIN staffcost ON YEAR(workdone.date) = staffcost.costyear

The above query does not make use of index which I have on workdone.date column and hence very slow. I have 2 options, i presume

Option 1

Add another column workdone.year which is updated through table oncreate and onupdate event. Use this column in the query.

Option 2

Add Generated (Virtual/Persistent) column workdone.year and then use this column in the query.

My Question:

  1. Which option is better? From Performance as well as data duplicity standpoint?
  2. Should I use Virtual OR Persistent column type?
  3. Is there any better alternative?

Update 1.1

I implemented the solution suggested by OJones, but explain shows me that index was not used. Am I reading the below screenshot incorrectly?


回答1:


Your query is fine as it is. But a query with a LEFT JOIN can only use an index on the right table (staffcost). No index on the left table (workdone) can support the join. So all you need is an index on staffcost(costyear).

You can test it with the following script:

DROP TABLE IF EXISTS `staffcost`;
CREATE TABLE IF NOT EXISTS `staffcost` (
  `id` int(10) unsigned NOT NULL,
  `costyear` year(4) NOT NULL,
  `data` text COLLATE utf8_unicode_ci,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO `staffcost` (`id`, `costyear`, `data`) VALUES
    (1, '2018', '0.6555866465490187'),
    (2, '2019', '0.12234661925802624'),
    (3, '2020', '0.64497318737672'),
    (4, '2021', '0.8578261098431667'),
    (5, '2022', '0.354211017819318'),
    (6, '2023', '0.19757679030073508'),
    (7, '2024', '0.9252509287793663'),
    (8, '2025', '0.03352430372827156'),
    (9, '2026', '0.3918687630369037'),
    (10, '2027', '0.8587709347333489');

DROP TABLE IF EXISTS `workdone`;
CREATE TABLE IF NOT EXISTS `workdone` (
  `id` int(10) unsigned NOT NULL,
  `date` date NOT NULL,
  `data` text COLLATE utf8_unicode_ci,
  PRIMARY KEY (`id`),
  KEY `date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO `workdone` (`id`, `date`, `data`) VALUES
    (1, '2017-12-31', '0.40540353712197724'),
    (2, '2018-01-01', '0.8716141803857071'),
    (3, '2018-01-02', '0.1418603212962489'),
    (4, '2018-01-03', '0.09445909605776807'),
    (5, '2018-01-04', '0.04671454713373868'),
    (6, '2018-01-05', '0.9501954782290342'),
    (7, '2018-01-06', '0.6108337804776'),
    (8, '2018-01-07', '0.2035824984345422'),
    (9, '2018-01-08', '0.18541118147355615'),
    (10, '2018-01-09', '0.31630844279779907');

EXPLAIN
SELECT * FROM workdone
LEFT JOIN staffcost ON YEAR(workdone.date) = staffcost.costyear;

ALTER TABLE `staffcost` ADD INDEX `costyear` (`costyear`);

EXPLAIN
SELECT * FROM workdone
LEFT JOIN staffcost ON YEAR(workdone.date) = staffcost.costyear;

SELECT VERSION();

Results:

id|select_type|table    |type|possible_keys|key|key_len|ref|rows|Extra
 1|SIMPLE     |workdone |ALL |             |   |       |   |  10|
 1|SIMPLE     |staffcost|ALL |             |   |       |   |  10|Using where; Using join buffer (flat, BNL join)

id|select_type|table    |type|possible_keys|key     |key_len|ref |rows|Extra
1 |SIMPLE     |workdone |ALL |             |        |       |    |  10|
1 |SIMPLE     |staffcost|ref |costyear     |costyear|1      |func|   1|Using where

VERSION()
10.1.26-MariaDB

Online demo: http://rextester.com/JIAL51740




回答2:


You could try this:

FROM workdone
LEFT JOIN staffcost ON workdone.date >= MAKEDATE(staffcost.costyear, 1)
                   AND workdone.date <  MAKEDATE(staffcost.costyear+1, 1)

This will allow the use of an index on workdone.date to search for dates between the first day of the costyear up until but not including the first day of costyear+1.

In general, this kind of range search can exploit indexes where functions (such as YEAR(datestamp)) can't.




回答3:


I think the generated column is much the better option. It does not make a difference whether or not you persist it. It does make a difference if you index it.

MySQL (with Innodb) supports indexes on virtual columns. So, you can do that. Or persist the column and use that.

That said, I don't think it will make much difference for this query. A year selection is not highly restrictive. And, you are doing this against another table rather than a constant. An index on staffcost(costyear) seems more important.



来源:https://stackoverflow.com/questions/48590120/mysql-join-based-on-year-indexing-column-add-or-generated-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!