Issue when importing dataset: `Error in scan(…): line 1 did not have 145 elements`

后端 未结 11 1239
我在风中等你
我在风中等你 2020-11-28 22:14

I\'m trying to import my dataset in R using read.table():

Dataset.df <- read.table(\"C:\\\\dataset.txt\", header=TRUE)

But

11条回答
  •  渐次进展
    2020-11-28 22:20

    I encountered this issue while importing some of the files from the Add Health data into R (see: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/21600?archive=ICPSR&q=21600 ) For example, the following command to read the DS12 data file in tab separated .tsv format will generate the following error:

    ds12 <- read.table("21600-0012-Data.tsv", sep="\t", comment.char="", 
    quote = "\"", header=TRUE)
    
    Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
    na.strings,  : line 2390 did not have 1851 elements
    

    It appears there is a slight formatting issue with some of the files that causes R to reject the file. At least part of the issue appears to be the occasional use of double quotes instead of an apostrophe that causes an uneven number of double quote characters in a line.

    After fiddling, I've identified three possible solutions:

    1. Open the file in a text editor and search/replace all instances of a quote character " with nothing. In other words, delete all double quotes. For this tab-delimited data, this meant only that some verbatim excerpts of comments from subjects were no longer in quotes which was a non-issue for my data analysis.

    2. With data stored on ICPSR (see link above) or other archives another solution is to download the data in a new format. A good option in this case is to download the Stata version of the DS12 and then open it using the read.dta command as follows:

      library(foreign)
      ds12 <- read.dta("21600-0012-Data.dta")
      
    3. A related solution/hack is to open the .tsv file in Excel and re-save it as a tab separated text file. This seems to clean up whatever formatting issue makes R unhappy.

    None of these are ideal in that they don't quite solve the problem in R with the original .tsv file but data wrangling often requires the use of multiple programs and formats.

提交回复
热议问题