What does the “More Columns than Column Names” error mean?

后端 未结 5 1530
春和景丽
春和景丽 2021-01-07 21:09

I\'m trying to read in a .csv file from the IRS and it doesn\'t appear to be formatted in any weird way.

I\'m using the read.table() function, which I h

5条回答
  •  情书的邮戳
    2021-01-07 21:55

    It uses commas as separators. So you can either set sep="," or just use read.csv:

    x <- read.csv(file="http://www.irs.gov/file_source/pub/irs-soi/countyinflow1011.csv")
    dim(x)
    ## [1] 113593      9
    

    The error is caused by spaces in some of the values, and unmatched quotes. There are no spaces in the header, so read.table thinks that there is one column. Then it thinks it sees multiple columns in some of the rows. For example, the first two lines (header and first row):

    State_Code_Dest,County_Code_Dest,State_Code_Origin,County_Code_Origin,State_Abbrv,County_Name,Return_Num,Exmpt_Num,Aggr_AGI
    00,000,96,000,US,Total Mig - US & For,6973489,12948316,303495582
    

    And unmatched quotes, for example on line 1336 (row 1335) which will confuse read.table with the default quote argument (but not read.csv):

    01,089,24,033,MD,Prince George's County,13,30,1040
    

提交回复
热议问题