Table with 80 million records and adding an index takes more than 18 hours (or forever)! Now what?

前端 未结 5 772
心在旅途
心在旅途 2020-12-07 15:56

A short recap of what happened. I am working with 71 million records (not much compared to billions of records processed by others). On a different thread, someone suggested

5条回答
  •  天命终不由人
    2020-12-07 16:38

    Ok turns out that this problem was more than just a simple create a table, index it and forget problem :) Here's what I did just in case someone else faces the same problem (I have used an example of IP Address but it works for other data types too):

    Problem: Your table has millions of entries and you need to add an index really fast

    Usecase: Consider storing millions of IP addresses in a lookup table. Adding the IP addresses should not be a big problem but creating an index on them takes more than 14 hours.

    Solution: Partition your table using MySQL's Partitioning strategy

    Case #1: When the table you want is not yet created

    CREATE TABLE IPADDRESSES(
      id INT UNSIGNED NOT NULL AUTO_INCREMENT,
      ipaddress BIGINT UNSIGNED,
      PRIMARY KEY(id, ipaddress)
    ) ENGINE=MYISAM
    PARTITION BY HASH(ipaddress)
    PARTITIONS 20;
    

    Case #2: When the table you want is already created. There seems to be a way to use ALTER TABLE to do this but I have not yet figured out a proper solution for this. Instead, there is a slightly inefficient solution:

    CREATE TABLE IPADDRESSES_TEMP(
      id INT UNSIGNED NOT NULL AUTO_INCREMENT,
      ipaddress BIGINT UNSIGNED,
      PRIMARY KEY(id)
    ) ENGINE=MYISAM;
    

    Insert your IP addresses into this table. And then create the actual table with partitions:

    CREATE TABLE IPADDRESSES(
      id INT UNSIGNED NOT NULL AUTO_INCREMENT,
      ipaddress BIGINT UNSIGNED,
      PRIMARY KEY(id, ipaddress)
    ) ENGINE=MYISAM
    PARTITION BY HASH(ipaddress)
    PARTITIONS 20;
    

    And then finally

    INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP;
    DROP TABLE IPADDRESSES_TEMP;
    ALTER TABLE IPADDRESSES ADD INDEX(ipaddress)
    

    And there you go... indexing on the new table took me about 2 hours on a 3.2GHz machine with 1GB RAM :) Hope this helps.

提交回复
热议问题