How to read in numbers with a comma as decimal separator?

馋奶兔 提交于 2019-11-26 18:53:54

When you check ?read.table you will probably find all the answer that you need.

There are two issues with (continental) European csv files:

  1. What does the c in csv stand for? For standard csv this is a ,, for European csv this is a ;
    sep is the corresponding argument in read.table
  2. What is the character for the decimal point? For standard csv this is a ., for European csv this is a ,
    dec is the corresponding argument in read.table

To read standard csv use read.csv, to read European csv use read.csv2. These two functions are just wrappers to read.table that set the appropriate arguments.

If your file does not follow either of these standards set the arguments manually.

aL3xa

From ?read.table:

dec     the character used in the file for decimal points.

And yes, you can use that for read.csv as well. (to me: no stupid, you cannot!)

Alternatively, you can also use

read.csv2

which assumes a "," decimal separator and a ";" for column separators.

read.csv(... , sep=";")

Suppose this imported field is called "amount", you can fix the type in this way if your numbers are being read in as character:

d$amount <- sub(",",".",d$amount)
d$amount <- as.numeric(d$amount)

I have this happen to me frequently along with a bunch of other little annoyances when importing from excel or excel csv. As it seems that there's no consistent way to ensure getting what you expect when you import into R, post-hoc fixes seem to be the best method. By that I mean, LOOK at what you imported - make sure it's what you expected and fix it if it's not.

can be used as follow:

mydata <- read.table(fileIn, dec=",")

input file (fileIn):

D:\TEST>more input2.txt

06-05-2014 09:19:38 3,182534 0

06-05-2014 09:19:51 4,2311 0

Problems may also be solved if you indicate how your missing values are represented (na.strings=...). For example V1 and V2 here have the same format (decimals separated by "," in csv file), but since NAs are present in V1 it is interpreted as factor:

dat <- read.csv2("...csv", header=TRUE)
head(dat)

> ID x    time    V1    V2
> 1  1   0:01:00 0,237 0.621
> 2  1   0:02:00 0,242 0.675
> 3  1   0:03:00 0,232 0.398


dat <- read.csv2("...csv", header=TRUE, na.strings="---")
head(dat)

> ID x    time    V1    V2
> 1  1   0:01:00 0.237 0.621
> 2  1   0:02:00 0.242 0.675
> 3  1   0:03:00 0.232 0.398

maybe

as.is=T

this also prevents to convert the character columns into factors

Just to add to Brandon's answer above, which worked well for me (I don't have enough rep to comment):

If you're using

    d$amount <- sub(",",".",d$amount)
    d$amount <- as.numeric(d$amount)

don't forget that you may need sub("[.]", "", d$amount, perl=T) to get around the . character.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!