How to detect the right encoding for read.csv?

后端 未结 6 1846
遥遥无期
遥遥无期 2020-11-27 11:02

I have this file (http://b7hq6v.alterupload.com/en/) that I want to read in R with read.csv. But I am not able to detect the correct encoding. It seems to be a

6条回答
  •  夕颜
    夕颜 (楼主)
    2020-11-27 11:52

    My tidy update to @marek's solution, since I'm running into the same problem in 2020:

    #Libraries
    library(magrittr)
    library(purrr)
    
    #Make a vector of all the encodings supported by R
    encodings <- set_names(iconvlist(), iconvlist())
    #Make a simple reader function
    reader <- function(encoding, file) {
      read.csv(file, fileEncoding = encoding, nrows = 3, header = TRUE)
    }
    #Create a "safe" version so we only get warnings, but errors don't stop it
    # (May not always be necessary)
    safe_reader <- safely(reader)
    
    #Use the safe function with the encodings and the file being interrogated
    map(encodings, safe_reader, ``) %>%
      #Return just the results
      map("result") %>%
      #Keep only results that are dataframes
      keep(is.data.frame) %>%
      #Keep only results with more than one column
        #This predicate will need to change with the data
        #I knew this would work, because I could open in a text editor
      keep(~ ncol(.x) > 1) %>%
      #Return the names of the encodings
      names()
    

提交回复
热议问题