CSV into MySQL using load data

£可爱£侵袭症+ 提交于 2019-12-24 07:44:02

问题


I'm hoping to insert the contents of a CSV into my table using MySQL loadfile, however, everytime I do so with the following command a number of rows are dropped

LOAD DATA INFILE 'new.csv' INTO TABLE Example
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;

The CSV data I am hoping to insert is the Free Company Product Data supplied by Companies House (http://download.companieshouse.gov.uk/en_output.html)

I'd greatly appreciate any help. Thanks!


回答1:


Given this table definition, from the provided data specification here:

create table companieshouse
(
CompanyName varchar(160),
CompanyNumber varchar(8),
RegAddressCareOf varchar(100),
RegAddressPOBox varchar(10),
RegAddressAddressLine1 varchar(300),
RegAddressAddressLine2 varchar(300),
RegAddressPostTown varchar(50),
RegAddressCounty varchar(60),
RegAddressCountry varchar(50),
RegAddressPostCode varchar(20),
CompanyCategory varchar(100),
CompanyStatus varchar(70),
CountryOfOrigin varchar(50),
DissolutionDate date,
IncorporationDate date,
AccountsAccountRefDay integer,
AccountsAccountRefMonth integer,
AccountsNextDueDate date,
AccountsLastMadeUpDate date,
AccountsAccountCategory varchar(30),
ReturnsNextDueDate date,
ReturnsLastMadeUpDate date,
MortgagesNumMortCharges integer,
MortgagesNumMortOutstanding integer,
MortgagesNumMortPartSatisfied integer,
MortgagesNumMortSatisfied integer,
SICCodeSicText_1 varchar(170),
SICCodeSicText_2 varchar(170),
SICCodeSicText_3 varchar(170),
SICCodeSicText_4 varchar(170),
LimitedPartnershipsNumGenPartners integer,
LimitedPartnershipsNumLimPartners integer,
URI varchar(47),
PreviousName1CONDATE date,
PreviousName1CompanyName varchar(160),
PreviousName2CONDATE date,
PreviousName2CompanyName varchar(160),
PreviousName3CONDATE date,
PreviousName3CompanyName varchar(160),
PreviousName4CONDATE date,
PreviousName4CompanyName varchar(160),
PreviousName5CONDATE date,
PreviousName5CompanyName varchar(160),
PreviousName6CONDATE date, 
PreviousName6CompanyName varchar(160),
PreviousName7CONDATE date,
PreviousName7CompanyName varchar(160),
PreviousName8CONDATE date,
PreviousName8CompanyName varchar(160),
PreviousName9CONDATE date, 
PreviousName9CompanyName varchar(160),
PreviousName10CONDATE date, 
PreviousName10CompanyName varchar(160),
ConfStmtNextDueDate date, 
ConfStmtLastMadeUpDate date
);

This will load data from the provided .csv files into the table;

LOAD DATA INFILE '/var/lib/mysql-files/BasicCompanyData-2017-03-06-part1_5.csv' 
INTO TABLE companieshouse 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
ESCAPED BY ""
LINES TERMINATED BY '\n'
 IGNORE 1 LINES
(
CompanyName,
CompanyNumber,
RegAddressCareOf,
RegAddressPOBox,
RegAddressAddressLine1,
RegAddressAddressLine2,
RegAddressPostTown,
RegAddressCounty,
RegAddressCountry,
RegAddressPostCode,
CompanyCategory,
CompanyStatus,
CountryOfOrigin,
@DissolutionDate,
@IncorporationDate,
@AccountsAccountRefDay,
@AccountsAccountRefMonth,
@AccountsNextDueDate,
@AccountsLastMadeUpDate,
AccountsAccountCategory,
@ReturnsNextDueDate,
@ReturnsLastMadeUpDate,
@MortgagesNumMortCharges,
@MortgagesNumMortOutstanding,
@MortgagesNumMortPartSatisfied,
@MortgagesNumMortSatisfied,
SICCodeSicText_1,
SICCodeSicText_2,
SICCodeSicText_3,
SICCodeSicText_4,
@LimitedPartnershipsNumGenPartners,
@LimitedPartnershipsNumLimPartners,
URI,
@PreviousName1CONDATE,
PreviousName1CompanyName,
@PreviousName2CONDATE,
PreviousName2CompanyName,
@PreviousName3CONDATE,
PreviousName3CompanyName,
@PreviousName4CONDATE,
PreviousName4CompanyName,
@PreviousName5CONDATE,
PreviousName5CompanyName,
@PreviousName6CONDATE, 
PreviousName6CompanyName,
@PreviousName7CONDATE,
PreviousName7CompanyName,
@PreviousName8CONDATE,
PreviousName8CompanyName,
@PreviousName9CONDATE, 
PreviousName9CompanyName,
@PreviousName10CONDATE, 
PreviousName10CompanyName,
@ConfStmtNextDueDate, 
@ConfStmtLastMadeUpDate)
SET DissolutionDate = IF(@DissolutionDate = '', NULL, STR_TO_DATE(@DissolutionDate, '%d/%m/%Y')),
IncorporationDate = IF(@IncorporationDate = '', NULL, STR_TO_DATE(@IncorporationDate, '%d/%m/%Y')),
AccountsNextDueDate = IF(@AccountsNextDueDate = '', NULL, STR_TO_DATE(@AccountsNextDueDate, '%d/%m/%Y')),
AccountsLastMadeUpDate = IF(@AccountsLastMadeUpDate = '', NULL, STR_TO_DATE(@AccountsLastMadeUpDate, '%d/%m/%Y')),
ReturnsNextDueDate = IF(@ReturnsNextDueDate = '', NULL,  STR_TO_DATE(@ReturnsNextDueDate, '%d/%m/%Y')),
ReturnsLastMadeUpDate = IF(@ReturnsLastMadeUpDate = '', NULL, STR_TO_DATE(@ReturnsLastMadeUpDate, '%d/%m/%Y')),
PreviousName1CONDATE = IF(@PreviousName1CONDATE = '', NULL, STR_TO_DATE(@PreviousName1CONDATE, '%d/%m/%Y')),
PreviousName2CONDATE = IF(@PreviousName2CONDATE = '', NULL, STR_TO_DATE(@PreviousName2CONDATE, '%d/%m/%Y')),
PreviousName3CONDATE = IF(@PreviousName3CONDATE = '', NULL, STR_TO_DATE(@PreviousName3CONDATE, '%d/%m/%Y')),
PreviousName4CONDATE = IF(@PreviousName4CONDATE = '', NULL, STR_TO_DATE(@PreviousName4CONDATE, '%d/%m/%Y')),
PreviousName5CONDATE = IF(@PreviousName5CONDATE = '', NULL, STR_TO_DATE(@PreviousName5CONDATE, '%d/%m/%Y')),
PreviousName6CONDATE = IF(@PreviousName6CONDATE = '', NULL, STR_TO_DATE(@PreviousName6CONDATE, '%d/%m/%Y')),
PreviousName7CONDATE = IF(@PreviousName7CONDATE = '', NULL, STR_TO_DATE(@PreviousName7CONDATE, '%d/%m/%Y')),
PreviousName8CONDATE = IF(@PreviousName8CONDATE = '', NULL, STR_TO_DATE(@PreviousName8CONDATE, '%d/%m/%Y')),
PreviousName9CONDATE = IF(@PreviousName9CONDATE = '', NULL, STR_TO_DATE(@PreviousName9CONDATE, '%d/%m/%Y')),
PreviousName10CONDATE = IF(@PreviousName10CONDATE = '', NULL, STR_TO_DATE(@PreviousName10CONDATE, '%d/%m/%Y')),
AccountsAccountRefDay = NULLIF(@AccountsAccountRefDay, ''),
AccountsAccountRefMonth = NULLIF(@AccountsAccountRefMonth, '') ,
MortgagesNumMortCharges = NULLIF(@MortgagesNumMortCharges, ''),
MortgagesNumMortOutstanding = NULLIF(@MortgagesNumMortOutstanding, ''),
MortgagesNumMortPartSatisfied = NULLIF(@MortgagesNumMortPartSatisfied, ''),
MortgagesNumMortSatisfied = NULLIF(@MortgagesNumMortSatisfied, ''),
LimitedPartnershipsNumGenPartners = NULLIF(@LimitedPartnershipsNumGenPartners, ''),
LimitedPartnershipsNumLimPartners = NULLIF(@LimitedPartnershipsNumLimPartners, '')
;

... loaded the data without any errors or warnings:

Query OK, 849999 rows affected (19.43 sec)
Records: 849999  Deleted: 0  Skipped: 0  Warnings: 0



回答2:


From one side - LOAD DATA INFILE, fasted method, but real life always require some checks, transformations and other business logic.

And normal practice - use ETL tools rather than direct import

or multi stages processes - clean the data (check, log errors, transform, add calculated columns and etc on 1st step), than import final result.

Now there are many excellent OpenSource tools for this:

  • Talend - http://www.talend.com
  • StreamSets - http://www.streamsets.com
  • Apache NiFi - https://nifi.apache.org, https://hortonworks.com/apache/nifi/

transforming Your import/export logic You can realise all steps, such as:

  • download files
  • parse them
  • make any changes and lookups
  • and after use direct loading from tools or call bulk load if it prefer
  • good example recommendations from comments for Your other question - https://dba.stackexchange.com/questions/168194/mysql-select-and-sort-performance-v-large-table - easy add date column from wrong formatted files


来源:https://stackoverflow.com/questions/43017082/csv-into-mysql-using-load-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!