I want to load the wikipedia category-links table into the MySql database. I used the following command in the terminal:
mysql -u root -proot categorylinks &
1) Separate the SQL dump file to two parts: creation and data
grep -n "Dumping data" enwiki-latest-redirect.sqlThis will return the line number separating the creation and data of the table parts of the file. Let it be 46.
tail -n +46 enwiki-latest-redirect.sql > redirect -data.sqlThe number we get in the last command is used as the input to tail to get the data part of the file.
head -46 enwiki-latest-redirect.sql > redirect-creation.sqlThe number we get in the first command is used as the input to head to get the table creation part of the file.
2) Remove indexes from the redirect-creation.sql
The following is the original CREATE TABLE statement in the dump:
CREATE TABLE `redirect` (
`rd_from` int(8) unsigned NOT NULL DEFAULT '0',
`rd_namespace` int(11) NOT NULL DEFAULT '0',
`rd_title` varbinary(255) NOT NULL DEFAULT '',
`rd_interwiki` varbinary(32) DEFAULT NULL,
`rd_fragment` varbinary(255) DEFAULT NULL,
PRIMARY KEY (`rd_from`),
KEY `rd_ns_title` (`rd_namespace`,`rd_title`,`rd_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary;
The following is the desired version after removing the indexes:
CREATE TABLE `redirect` (
`rd_from` int(8) unsigned NOT NULL DEFAULT '0',
`rd_namespace` int(11) NOT NULL DEFAULT '0',
`rd_title` varbinary(255) NOT NULL DEFAULT '',
`rd_interwiki` varbinary(32) DEFAULT NULL,
`rd_fragment` varbinary(255) DEFAULT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=binary;
Edit the page-creation.sql to remove indexes as shown above.
3) Load the created files
mysql -u root -p wikipedia < redirect-creation.sql
mysql -u root -p wikipedia < redirect-data.sql
4) Create indexes on fields you want after loading the data
create index idx_redirect_rd_from on redirect(rd_from)
I am able to load the categorylinks table in under an hour on my Macbook Pro by following the steps above.