问题
I'm hoping to insert the contents of a CSV into my table using MySQL loadfile, however, everytime I do so with the following command a number of rows are dropped
LOAD DATA INFILE 'new.csv' INTO TABLE Example
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
The CSV data I am hoping to insert is the Free Company Product Data supplied by Companies House (http://download.companieshouse.gov.uk/en_output.html)
I'd greatly appreciate any help. Thanks!
回答1:
Given this table definition, from the provided data specification here:
create table companieshouse
(
CompanyName varchar(160),
CompanyNumber varchar(8),
RegAddressCareOf varchar(100),
RegAddressPOBox varchar(10),
RegAddressAddressLine1 varchar(300),
RegAddressAddressLine2 varchar(300),
RegAddressPostTown varchar(50),
RegAddressCounty varchar(60),
RegAddressCountry varchar(50),
RegAddressPostCode varchar(20),
CompanyCategory varchar(100),
CompanyStatus varchar(70),
CountryOfOrigin varchar(50),
DissolutionDate date,
IncorporationDate date,
AccountsAccountRefDay integer,
AccountsAccountRefMonth integer,
AccountsNextDueDate date,
AccountsLastMadeUpDate date,
AccountsAccountCategory varchar(30),
ReturnsNextDueDate date,
ReturnsLastMadeUpDate date,
MortgagesNumMortCharges integer,
MortgagesNumMortOutstanding integer,
MortgagesNumMortPartSatisfied integer,
MortgagesNumMortSatisfied integer,
SICCodeSicText_1 varchar(170),
SICCodeSicText_2 varchar(170),
SICCodeSicText_3 varchar(170),
SICCodeSicText_4 varchar(170),
LimitedPartnershipsNumGenPartners integer,
LimitedPartnershipsNumLimPartners integer,
URI varchar(47),
PreviousName1CONDATE date,
PreviousName1CompanyName varchar(160),
PreviousName2CONDATE date,
PreviousName2CompanyName varchar(160),
PreviousName3CONDATE date,
PreviousName3CompanyName varchar(160),
PreviousName4CONDATE date,
PreviousName4CompanyName varchar(160),
PreviousName5CONDATE date,
PreviousName5CompanyName varchar(160),
PreviousName6CONDATE date,
PreviousName6CompanyName varchar(160),
PreviousName7CONDATE date,
PreviousName7CompanyName varchar(160),
PreviousName8CONDATE date,
PreviousName8CompanyName varchar(160),
PreviousName9CONDATE date,
PreviousName9CompanyName varchar(160),
PreviousName10CONDATE date,
PreviousName10CompanyName varchar(160),
ConfStmtNextDueDate date,
ConfStmtLastMadeUpDate date
);
This will load data from the provided .csv files into the table;
LOAD DATA INFILE '/var/lib/mysql-files/BasicCompanyData-2017-03-06-part1_5.csv'
INTO TABLE companieshouse
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(
CompanyName,
CompanyNumber,
RegAddressCareOf,
RegAddressPOBox,
RegAddressAddressLine1,
RegAddressAddressLine2,
RegAddressPostTown,
RegAddressCounty,
RegAddressCountry,
RegAddressPostCode,
CompanyCategory,
CompanyStatus,
CountryOfOrigin,
@DissolutionDate,
@IncorporationDate,
@AccountsAccountRefDay,
@AccountsAccountRefMonth,
@AccountsNextDueDate,
@AccountsLastMadeUpDate,
AccountsAccountCategory,
@ReturnsNextDueDate,
@ReturnsLastMadeUpDate,
@MortgagesNumMortCharges,
@MortgagesNumMortOutstanding,
@MortgagesNumMortPartSatisfied,
@MortgagesNumMortSatisfied,
SICCodeSicText_1,
SICCodeSicText_2,
SICCodeSicText_3,
SICCodeSicText_4,
@LimitedPartnershipsNumGenPartners,
@LimitedPartnershipsNumLimPartners,
URI,
@PreviousName1CONDATE,
PreviousName1CompanyName,
@PreviousName2CONDATE,
PreviousName2CompanyName,
@PreviousName3CONDATE,
PreviousName3CompanyName,
@PreviousName4CONDATE,
PreviousName4CompanyName,
@PreviousName5CONDATE,
PreviousName5CompanyName,
@PreviousName6CONDATE,
PreviousName6CompanyName,
@PreviousName7CONDATE,
PreviousName7CompanyName,
@PreviousName8CONDATE,
PreviousName8CompanyName,
@PreviousName9CONDATE,
PreviousName9CompanyName,
@PreviousName10CONDATE,
PreviousName10CompanyName,
@ConfStmtNextDueDate,
@ConfStmtLastMadeUpDate)
SET DissolutionDate = IF(@DissolutionDate = '', NULL, STR_TO_DATE(@DissolutionDate, '%d/%m/%Y')),
IncorporationDate = IF(@IncorporationDate = '', NULL, STR_TO_DATE(@IncorporationDate, '%d/%m/%Y')),
AccountsNextDueDate = IF(@AccountsNextDueDate = '', NULL, STR_TO_DATE(@AccountsNextDueDate, '%d/%m/%Y')),
AccountsLastMadeUpDate = IF(@AccountsLastMadeUpDate = '', NULL, STR_TO_DATE(@AccountsLastMadeUpDate, '%d/%m/%Y')),
ReturnsNextDueDate = IF(@ReturnsNextDueDate = '', NULL, STR_TO_DATE(@ReturnsNextDueDate, '%d/%m/%Y')),
ReturnsLastMadeUpDate = IF(@ReturnsLastMadeUpDate = '', NULL, STR_TO_DATE(@ReturnsLastMadeUpDate, '%d/%m/%Y')),
PreviousName1CONDATE = IF(@PreviousName1CONDATE = '', NULL, STR_TO_DATE(@PreviousName1CONDATE, '%d/%m/%Y')),
PreviousName2CONDATE = IF(@PreviousName2CONDATE = '', NULL, STR_TO_DATE(@PreviousName2CONDATE, '%d/%m/%Y')),
PreviousName3CONDATE = IF(@PreviousName3CONDATE = '', NULL, STR_TO_DATE(@PreviousName3CONDATE, '%d/%m/%Y')),
PreviousName4CONDATE = IF(@PreviousName4CONDATE = '', NULL, STR_TO_DATE(@PreviousName4CONDATE, '%d/%m/%Y')),
PreviousName5CONDATE = IF(@PreviousName5CONDATE = '', NULL, STR_TO_DATE(@PreviousName5CONDATE, '%d/%m/%Y')),
PreviousName6CONDATE = IF(@PreviousName6CONDATE = '', NULL, STR_TO_DATE(@PreviousName6CONDATE, '%d/%m/%Y')),
PreviousName7CONDATE = IF(@PreviousName7CONDATE = '', NULL, STR_TO_DATE(@PreviousName7CONDATE, '%d/%m/%Y')),
PreviousName8CONDATE = IF(@PreviousName8CONDATE = '', NULL, STR_TO_DATE(@PreviousName8CONDATE, '%d/%m/%Y')),
PreviousName9CONDATE = IF(@PreviousName9CONDATE = '', NULL, STR_TO_DATE(@PreviousName9CONDATE, '%d/%m/%Y')),
PreviousName10CONDATE = IF(@PreviousName10CONDATE = '', NULL, STR_TO_DATE(@PreviousName10CONDATE, '%d/%m/%Y')),
AccountsAccountRefDay = NULLIF(@AccountsAccountRefDay, ''),
AccountsAccountRefMonth = NULLIF(@AccountsAccountRefMonth, '') ,
MortgagesNumMortCharges = NULLIF(@MortgagesNumMortCharges, ''),
MortgagesNumMortOutstanding = NULLIF(@MortgagesNumMortOutstanding, ''),
MortgagesNumMortPartSatisfied = NULLIF(@MortgagesNumMortPartSatisfied, ''),
MortgagesNumMortSatisfied = NULLIF(@MortgagesNumMortSatisfied, ''),
LimitedPartnershipsNumGenPartners = NULLIF(@LimitedPartnershipsNumGenPartners, ''),
LimitedPartnershipsNumLimPartners = NULLIF(@LimitedPartnershipsNumLimPartners, '')
;
... loaded the data without any errors or warnings:
Query OK, 849999 rows affected (19.43 sec)
Records: 849999 Deleted: 0 Skipped: 0 Warnings: 0
回答2:
From one side - LOAD DATA INFILE, fasted method, but real life always require some checks, transformations and other business logic.
And normal practice - use ETL tools rather than direct import
or multi stages processes - clean the data (check, log errors, transform, add calculated columns and etc on 1st step), than import final result.
Now there are many excellent OpenSource tools for this:
- Talend - http://www.talend.com
- StreamSets - http://www.streamsets.com
- Apache NiFi - https://nifi.apache.org, https://hortonworks.com/apache/nifi/
transforming Your import/export logic You can realise all steps, such as:
- download files
- parse them
- make any changes and lookups
- and after use direct loading from tools or call bulk load if it prefer
- good example recommendations from comments for Your other question - https://dba.stackexchange.com/questions/168194/mysql-select-and-sort-performance-v-large-table - easy add date column from wrong formatted files
来源:https://stackoverflow.com/questions/43017082/csv-into-mysql-using-load-data