R is adding extra numbers while reading file

问题

I have been trying to read a file which has date field and a numeric field. I have the data in an excel sheet and looks something like below -

Date          X       
1/25/2008     0.0023456
12/23/2008    0.001987

When I read this in R using the readxl::read_xlsx function, the data in R looks like below -

Date          X
1/25/2008     0.0023456000000000
12/23/2009    0.0019870000000000

I have tried limiting the digits using functions like round, format (nsmall = 7), etc. but nothing seems to work. What am I doing wrong? I also tried saving the data as a csv and a txt and read it using read.csv and read.delim but I face the same issue again. Any help would be really appreciated!

回答1:

As noted in the comments to the OP and the other answer, this problem is due to the way floating point math is handled on the processor being used to run R, and its interaction with the digits option.

To illustrate, we'll create an Excel spreadsheet with the data from the OP, and demonstrate what happens as we adjust the options(digits=) option.

Next, we'll write a short R script to illustrate what happens when we adjust the digits option.

> # first, display the number of significant digits set in R
> getOption("digits")
[1] 7
> 
> # Next, read data file from Excel
> library(xlsx)
> 
> theData <- read.xlsx("./data/smallNumbers.xlsx",1,header=TRUE)
> 
> head(theData)
        Date         X
1 2008-01-25 0.0023456
2 2008-12-23 0.0019870
> 
> # change digits to larger number to replicate SO question
> options(digits=17)
> getOption("digits")
[1] 17
> head(theData)
        Date                     X
1 2008-01-25 0.0023456000000000002
2 2008-12-23 0.0019870000000000001
>

However, the behavior of printing significant digits varies by processor / operating system, as setting options(digits=16) results in the following on a machine running an Intel i7-6500U processor with Microsoft Windows 10:

> # what happens when we set digits = 16?
> options(digits=16)
> getOption("digits")
[1] 16
> head(theData)
        Date         X
1 2008-01-25 0.0023456
2 2008-12-23 0.0019870
>

回答2:

library(formattable)

x <- formattable(x, digits = 7, format = "f")

or you may want to add this to get the default formatting from R:

options(defaultPackages = "")

then, restart your R.

回答3:

Perhaps the problem isn't your source file as you say this happens with .csv and .txt as well.

Try checking to see the current value of your display digits option by running options()$digits

If the result is e.g. 14 then that is likely the problem.

In which case, try running r command options(digits=8) which will set the display digits=8 for the session.

Then, simply reprint your dataframe to see the change has already taken effect with respect to how the decimals are displayed by default to the screen.

Consult ?options for more info about digits display setting and other session options.

Edit to improve original answer and to clarify for future readers:

Changing options(digits=x) either up or down does not change the value that is stored or read into into internal memory for floating point variables. The digits session option merely changes how the floating point values print i.e. display on the screen for common print functions per the '?options` documentation:

digits: controls the number of significant digits to print when printing numeric values.

What the OP showed as the problem he was having (R displaying more decimals after last digit in a decimal number than the OP expected to see) was not caused by the source file having been read from Excel - i.e. given the OP had the same problem with CSV and TXT the import process didn't cause a problem.

If you are seeing more decimals than you want by default in your printed/displayed output (e.g. for dataframes and numeric variables) try checking options()$digits and understand that option is simply the default for the number of digits used by R's common display and printing methods. HOWEVER, it does not affect floating point storage on any of your data or variables.

Regarding floating point numbers though, another answer here shows how setting option(digits=n) higher than the default can help demonstrate some precision/display idiosyncrasies that are related to floating point precision. That is a separate problem to what the OP displayed in his example but it's well worth understanding.

For a much more detailed and topic specific discussion of floating point precision than would be appropriate to rehash here, it's well worth reading this definitive SO question+answer: Why are these numbers not equal?
That other question+answer+discussion covers issues specifically around floating point precision and contains a long, well presented list of references that you will find helpful if you need more information on the subject.

来源：https://stackoverflow.com/questions/52162856/r-is-adding-extra-numbers-while-reading-file

标签

formatting

decimalformat

decimal-point