How to load df with 1000 separator in R as numeric class?

前端 未结 3 515
猫巷女王i
猫巷女王i 2020-12-22 04:26

I have a UTF-16 Unicode Text (.txt) file downloaded and defaulted as comma-separated values (.csv) when saved on a mac drive. This file contains numeric data that has 1000 s

相关标签:
3条回答
  • 2020-12-22 04:51

    Thank you everyone who helped here. I actually found that my load function was the problem, and the following code does a simple trick to read in data correctly from the start.

    read.csv(filename, sep="\t", fileEncoding="UTF-16", skip=1)    
    
    0 讨论(0)
  • 2020-12-22 05:07

    The worked example using setClass, setAs and colClasses:

     library(methods)
      setClass("chr.w.commas", contains=numeric())
      setAs("character", "chr.w.commas", function(from) 
                                  as.numeric(gsub("\\,", "",from )) )
     dat <- read.table(text="Orig after_gsub num
     1      '95.31'      '95.31'      '95.31'
     2     992.77     992.77     992.77
     3 '1,719.68'  '1719.68' NA
     4 '3,135.79'  '3135.79' NA
     5     111.91 111.91 111.91
     6     305.12     305.12     305.12", header=TRUE, colClasses="chr.w.commas")
     str(dat)
    'data.frame':   6 obs. of  3 variables:
     $ Orig      : num  95.3 992.8 1719.7 3135.8 111.9 ...
     $ after_gsub: num  95.3 992.8 1719.7 3135.8 111.9 ...
     $ num       : num  95.3 992.8 NA NA 111.9 ...
    
    0 讨论(0)
  • 2020-12-22 05:10

    I suspect that gsub doesn't work right on your UTF-16 strings. Perhaps you should convert the strings before doing the substitution. Try the following:

    tx <- read.table("/Users/username/Desktop/report.csv",sep="\t", dec = ".", fileEncoding = "UTF-16LE", fill = T, skip=1 , quote="", header=T, stringsAsFactors = FALSE)
    tx$Cost <- iconv(tx$Cost,"UTF-16","ASCII",sub='')
    tx$Cost <- gsub("\\,", replacement = "", x = tx$Cost)
    tx$Cost <- as.numeric(tx$Cost)
    
    0 讨论(0)
提交回复
热议问题