reading in a text file with a SUB (1a) (Control-Z) character in R on Windows

前端 未结 2 800
予麋鹿
予麋鹿 2020-12-14 23:12

Following on from my query last week reading badly formed csv in R - mismatched quotes, these same CSV files also have embedded control characters such as the ASCII Substitu

相关标签:
2条回答
  • 2020-12-14 23:48

    I think I've figured out a solution - because there appears to be a problem reading a Control-Z in the middle of a file on Windows, we need to read the file in binary / raw mode.

    fnam <- 'h3.txt'
    tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(dfnam)$size, 100))=1
    tmp.char <- rawToChar(tmp.bin)
    txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
    txt
    
    [1] "1,34,44.4,\" HIJK\032A \",99"
    

    Update The following better answer was posted by Duncan Murdoch to R-Devel refer. Converting it into a function I get:

    sReadLines <- function(fnam) {
        f <- file(fnam, "rb")
        res <- readLines(f)
        close(f)
        res
    }
    
    0 讨论(0)
  • 2020-12-14 23:48

    I also ran into this problem when I used read.csv with a csv file that contained the SUB or CTRL-Z in the middle of the file.

    Solved it with the readr package (if your file is comma separated)

    library(readr)
    read_csv("h3.txt")
    

    If you have a ; as a separator, then use:

    library(readr)
    read_csv2("h3.txt")
    
    0 讨论(0)
提交回复
热议问题