I have developed a user bulk upload module. There are 2 situations, when I do a bulk upload of 20 000 records when database has zero records. Its taking about 5 hours. But w
For my work, I have to add daily one CSV with 524 Columns and 10k records. When I have try to parse it and add the record with php, it was horrible.
So, I propose to you to see the documentation about LOAD DATA LOCAL INFILE
I copy/past my own code for example, but adapt him to your needs
$dataload = 'LOAD DATA LOCAL INFILE "'.$filename.'"
REPLACE
INTO TABLE '.$this->csvTable.' CHARACTER SET "utf8"
FIELDS TERMINATED BY "\t"
IGNORE 1 LINES
';
$result = (bool)$this->db->query($dataload);
Where $filename is a local path of your CSV (you can use dirname(__FILE__)
for get it )
This SQL command is very quick (just 1 or 2 second for add/update all the CSV)
EDIT : read the doc, but of course you need to have an uniq index on your user table for "replace" works. So, you don't need to check if the user exist or not. And you don't need to parse the CSV file with php.
You appear to have the possibility (probability?) of 3 queries for every single record. Those 3 queries are going to require 3 trips to the database (and if you are using yii storing the records in yii objects then that might slow things down even more).
Can you add a unique key on first name / last name / DOB and one on email address?
If so the you can just do INSERT....ON DUPLICATE KEY UPDATE. This would reduce it to a single query for each record, greatly speeding things up.
But the big advantage of this syntax is that you can insert / update many records at once (I normally stick to about 250), so even less trips to the database.
You can knock up a class that you just pass records to and which does the insert when the number of records hits your choice. Also add in a call to insert the records in the destructor to insert any final records.
Another option is to read everything in to a temp table and then use that as a source to join to your user table to do the updates / insert to. This would require a bit of effort with the indexes, but a bulk load to a temp table is quick, and a updates from that with useful indexes would be fast. Using it as a source for the inserts should also be fast (if you exclude the records already updated).
The other issue appears to be your following query, but not sure where you execute this. It appears to only need to be executed once, in which case it might not matter too much. You haven't given the structure of the CustomType table, but it is joined to Customfield and the field customTypeId has no index. Hence that join will be slow. Similarly on the CustomValue and CustomFieldSubArea joins which join based on customFieldId, and neither have an index on this field (hopefully a unique index, as if those fields are not unique you will get a LOT of records returned - 1 row for every possibly combination)
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
Always do bulk importing within a transation
$transaction = Yii::app()->db->beginTransaction();
$curRow = 0;
try
{
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
$curRow++;
//process $peopleData
//insert row
//best to use INSERT ... ON DUPLICATE KEY UPDATE
// a = 1
// b = 2;
if ($curRow % 5000 == 0) {
$transaction->commit();
$transaction->beginTransaction();
}
}
catch (Exception $ex)
{
$transaction->rollBack();
$result = $e->getMessage();
}
//don't forget the remainder.
$transaction->commit();
I have seen import routines sped up 500% by simply using this technique. I have also seen an import process that did 600 queries (mixture of select, insert, update and show table structure) for each row. This technique sped up the process 30%.
Indexes are your friend.
UPDATE User ... WHERE id = ...
-- Desperately needs an index on ID, probably PRIMARY KEY
.
Similarly for renameSource
.
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1;
Needs INDEX(firstName, lastName, dateOfBirth)
; the fields can be in any order (in this case).
Look at each query to see what it needs, then add that INDEX
to the table. Read my Cookbook on building indexes.
If I understand, for all the result of SELECT * FROM AdvanceBulkInsert
... you run a request SELECT cf.*
, and for all the SELECT cf.*
, you run the SELECT * FROM User
I think the issue is that you send way too much requests to the base.
I think you should merge all your select request in only one big request.
For that:
replace the
SELECT * FROM AdvanceBulkInsert
by a EXISTS IN (SELECT * FROM AdvanceBulkInsert where ...)
or a JOIN
replace the SELECT * FROM User
by a NOT EXISTS IN(SELECT * from User WHERE )
Then you call the update on all the result of the merged select.
You should too time one by one your request to find which of this requests take the most time, and you should too use ANALYSE to find what part of the request take time.
Edit:
Now I have see your code :
Some lead:
have you index for cf.customTypeId , cfv.customFieldId , cfsa.customFieldId, user. dateOfBirth ,user. firstName,user.lastName ?
you don't need to do a LEFT JOIN CustomFieldSubArea if you have a WHERE who use CustomFieldSubArea, a simple JOIN CustomFieldSubArea is enougth.
You will launch the query 2 a lot of time with relatedId = 0 , maybe you can save the result in a var?
if you don't need sorted data, remove the "ORDER BY cf.sortOrder, cf.label" . Else, add index on cf.sortOrder, cf.label
Try these things to increase your query performance:
User.id='51394'
, instead do User.id= 51394
.ENGINE=MyISAM
then you not able to define indexing in between your database table, change database engine to ENGINE=InnoDB
. And create some indexing like foreign keys, full text indexing.