R Programming: read.csv() skips lines unexpectedly

后端 未结 1 1375
心在旅途
心在旅途 2020-12-10 22:02

I am trying to read a CSV file in R (under linux) using read.csv(). After the function gets completed I find that the number of lines read in R is less than the number of li

相关标签:
1条回答
  • 2020-12-10 22:36

    Here's an example of using count.fields to determine where to look and perhaps apply fixes. You have a modest number of lines that are 23 'fields' in width:

    > table(count.fields("~/Downloads/bugs.csv", quote="", sep=","))
         2     23     30 
       502     10 136532 
    > table(count.fields("~/Downloads/bugs.csv", sep=","))
    # Just wanted to see if removing quote-recognition would help.... It didn't.
         2      4     10     12     20     22     23     25     28     30 
     11308     24     20     33    642    251     10      2    170 124584 
    > which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)
     [1] 104843 125158 127876 129734 130988 131456 132515 133048 136764
    [10] 136765
    

    I looked at the 23 with:

    txt <-readLines("~/Downloads/bugs.csv")[
                     which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)]
    

    And they had octothorpes ("#", hash-signs) which are comment characters in R data parlance.

    > table(count.fields("~/Downloads/bugs.csv", quote="", sep=",", comment.char=""))
        30 
    137044 
    

    So.... use those settings in read.table and you should be "good to go".

    0 讨论(0)
提交回复
热议问题