How to avoid “Using temporary” in many-to-many queries?

前端 未结 4 634
悲&欢浪女
悲&欢浪女 2020-12-02 01:03

This query is very simple, all I want to do, is get all the articles in given category ordered by last_updated field:

SELECT
    `articles`.*
FR         


        
4条回答
  •  南方客
    南方客 (楼主)
    2020-12-02 01:58

    Here's a simplified example I did for a similar performance related question sometime ago that takes advantage of innodb clustered primary key indexes (obviously only available with innodb !!)

    • http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
    • http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

    You have 3 tables: category, product and product_category as follows:

    drop table if exists product;
    create table product
    (
    prod_id int unsigned not null auto_increment primary key,
    name varchar(255) not null unique
    )
    engine = innodb; 
    
    drop table if exists category;
    create table category
    (
    cat_id mediumint unsigned not null auto_increment primary key,
    name varchar(255) not null unique
    )
    engine = innodb; 
    
    drop table if exists product_category;
    create table product_category
    (
    cat_id mediumint unsigned not null,
    prod_id int unsigned not null,
    primary key (cat_id, prod_id) -- **note the clustered composite index** !!
    )
    engine = innodb;
    

    The most import thing is the order of the product_catgeory clustered composite primary key as typical queries for this scenario always lead by cat_id = x or cat_id in (x,y,z...).

    We have 500K categories, 1 million products and 125 million product categories.

    select count(*) from category;
    +----------+
    | count(*) |
    +----------+
    |   500000 |
    +----------+
    
    select count(*) from product;
    +----------+
    | count(*) |
    +----------+
    |  1000000 |
    +----------+
    
    select count(*) from product_category;
    +-----------+
    | count(*)  |
    +-----------+
    | 125611877 |
    +-----------+
    

    So let's see how this schema performs for a query similar to yours. All queries are run cold (after mysql restart) with empty buffers and no query caching.

    select
     p.*
    from
     product p
    inner join product_category pc on 
        pc.cat_id = 4104 and pc.prod_id = p.prod_id
    order by
     p.prod_id desc -- sry dont a date field in this sample table - wont make any difference though
    limit 20;
    
    +---------+----------------+
    | prod_id | name           |
    +---------+----------------+
    |  993561 | Product 993561 |
    |  991215 | Product 991215 |
    |  989222 | Product 989222 |
    |  986589 | Product 986589 |
    |  983593 | Product 983593 |
    |  982507 | Product 982507 |
    |  981505 | Product 981505 |
    |  981320 | Product 981320 |
    |  978576 | Product 978576 |
    |  973428 | Product 973428 |
    |  959384 | Product 959384 |
    |  954829 | Product 954829 |
    |  953369 | Product 953369 |
    |  951891 | Product 951891 |
    |  949413 | Product 949413 |
    |  947855 | Product 947855 |
    |  947080 | Product 947080 |
    |  945115 | Product 945115 |
    |  943833 | Product 943833 |
    |  942309 | Product 942309 |
    +---------+----------------+
    20 rows in set (0.70 sec) 
    
    explain
    select
     p.*
    from
     product p
    inner join product_category pc on 
        pc.cat_id = 4104 and pc.prod_id = p.prod_id
    order by
     p.prod_id desc -- sry dont a date field in this sample table - wont make any diference though
    limit 20;
    
    +----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
    | id | select_type | table | type   | possible_keys | key     | key_len | ref           | rows | Extra                                        |
    +----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
    |  1 | SIMPLE      | pc    | ref    | PRIMARY       | PRIMARY | 3       | const           |  499 | Using index; Using temporary; Using filesort |
    |  1 | SIMPLE      | p     | eq_ref | PRIMARY       | PRIMARY | 4       | vl_db.pc.prod_id |    1 |                                              |
    +----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
    2 rows in set (0.00 sec)
    

    So that's 0.70 seconds cold - ouch.

    Hope this helps :)

    EDIT

    Having just read your reply to my comment above it seems you have one of two choices to make:

    create table articles_to_categories
    (
    article_id int unsigned not null,
    category_id mediumint unsigned not null,
    primary key(article_id, category_id), -- good for queries that lead with article_id = x
    key (category_id)
    )
    engine=innodb;
    

    or.

    create table categories_to_articles
    (
    article_id int unsigned not null,
    category_id mediumint unsigned not null,
    primary key(category_id, article_id), -- good for queries that lead with category_id = x
    key (article_id)
    )
    engine=innodb;
    

    depends on your typical queries as to how you define your clustered PK.

提交回复
热议问题