loading enwiki-latest-categorylinks.sql into mysql

后端 未结 1 997
猫巷女王i
猫巷女王i 2020-12-21 17:41

I want to load the wikipedia category-links table into the MySql database. I used the following command in the terminal:

mysql -u root -proot categorylinks &         


        
相关标签:
1条回答
  • 2020-12-21 18:31

    1) Separate the SQL dump file to two parts: creation and data

    1. grep -n "Dumping data" enwiki-latest-redirect.sql

    This will return the line number separating the creation and data of the table parts of the file. Let it be 46.

    1. tail -n +46 enwiki-latest-redirect.sql > redirect -data.sql

    The number we get in the last command is used as the input to tail to get the data part of the file.

    1. head -46 enwiki-latest-redirect.sql > redirect-creation.sql

    The number we get in the first command is used as the input to head to get the table creation part of the file.

    2) Remove indexes from the redirect-creation.sql

    The following is the original CREATE TABLE statement in the dump:

    CREATE TABLE `redirect` ( 
     `rd_from` int(8) unsigned NOT NULL DEFAULT '0', 
     `rd_namespace` int(11) NOT NULL DEFAULT '0', 
     `rd_title` varbinary(255) NOT NULL DEFAULT '', 
     `rd_interwiki` varbinary(32) DEFAULT NULL, 
     `rd_fragment` varbinary(255) DEFAULT NULL,   
      PRIMARY KEY (`rd_from`),   
      KEY `rd_ns_title` (`rd_namespace`,`rd_title`,`rd_from`)
    ) ENGINE=InnoDB DEFAULT CHARSET=binary; 
    

    The following is the desired version after removing the indexes:

    CREATE TABLE `redirect` ( 
     `rd_from` int(8) unsigned NOT NULL DEFAULT '0', 
     `rd_namespace` int(11) NOT NULL DEFAULT '0', 
     `rd_title` varbinary(255) NOT NULL DEFAULT '', 
     `rd_interwiki` varbinary(32) DEFAULT NULL, 
     `rd_fragment` varbinary(255) DEFAULT NULL, 
    ) ENGINE=InnoDB DEFAULT CHARSET=binary; 
    

    Edit the page-creation.sql to remove indexes as shown above.

    3) Load the created files

    mysql -u root -p wikipedia < redirect-creation.sql
    mysql -u root -p wikipedia < redirect-data.sql
    

    4) Create indexes on fields you want after loading the data

    create index idx_redirect_rd_from on redirect(rd_from)
    

    I am able to load the categorylinks table in under an hour on my Macbook Pro by following the steps above.

    0 讨论(0)
提交回复
热议问题