R read.csv “More columns than column names” error

后端 未结 5 583
半阙折子戏
半阙折子戏 2020-12-29 17:40

I have a problem when importing .csv file into R. With my code:

t <- read.csv(\"C:\\\\N0_07312014.CSV\", na.string=c(\"\",\"null\",\"NaN\",\"         


        
5条回答
  •  心在旅途
    2020-12-29 17:56

    That's one wonky CSV file. Multiple headers tossed about (try pasting it to CSV Fingerprint) to see what I mean.

    Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines and other R functions to pre-process the text:

    # use readLines to get the data
    dat <- readLines("N0_07312014.CSV")
    
    # i had to do this to fix grep errors
    Sys.setlocale('LC_ALL','C')
    
    # filter out the repeating, and wonky headers
    dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)
    
    # turn that vector into a text connection for read.csv
    dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
                      header=FALSE, stringsAsFactors=FALSE)
    
    str(dat_3)
    ## 'data.frame':    308 obs. of  37 variables:
    ##  $ V1 : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
    ##  $ V2 : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
    ##  $ V3 : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
    ##  $ V4 : chr  "" "" "" "" ...
    ## .. more
    ##  $ V36: chr  "" "" "" "" ...
    ##  $ V37: chr  "0" "0" "0" "0" ...
    
    # grab the headers
    headers <- strsplit(dat[1], ",")[[1]]
    
    # how many of them are there?
    length(headers)
    ## [1] 32
    
    # limit it to the 32 columns you want (Which matches)
    dat_4 <- dat_3[,1:32]
    
    # and add the headers
    colnames(dat_4) <- headers
    
    str(dat_4)
    ## 'data.frame':    308 obs. of  32 variables:
    ##  $ Node Name         : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
    ##  $ RTC_date          : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
    ##  $ RTC_time          : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
    ##  $ N1 Bat (VDC)      : chr  "" "" "" "" ...
    ##  $ N1 Shinyei (ug/m3): chr  "" "" "0.23" "null" ...
    ##  $ N1 CC (ppb)       : chr  "" "" "null" "null" ...
    ##  $ N1 Aeroq (ppm)    : chr  "" "" "null" "null" ...
    ## ... continues
    

提交回复
热议问题