reading in a text file with a SUB (1a) (Control-Z) character in R on Windows

前端 未结 2 803
予麋鹿
予麋鹿 2020-12-14 23:12

Following on from my query last week reading badly formed csv in R - mismatched quotes, these same CSV files also have embedded control characters such as the ASCII Substitu

2条回答
  •  执念已碎
    2020-12-14 23:48

    I think I've figured out a solution - because there appears to be a problem reading a Control-Z in the middle of a file on Windows, we need to read the file in binary / raw mode.

    fnam <- 'h3.txt'
    tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(dfnam)$size, 100))=1
    tmp.char <- rawToChar(tmp.bin)
    txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
    txt
    
    [1] "1,34,44.4,\" HIJK\032A \",99"
    

    Update The following better answer was posted by Duncan Murdoch to R-Devel refer. Converting it into a function I get:

    sReadLines <- function(fnam) {
        f <- file(fnam, "rb")
        res <- readLines(f)
        close(f)
        res
    }
    

提交回复
热议问题