Error with large numerics in dcast.data.table

浪子不回头ぞ 提交于 2019-12-25 07:30:59

问题


Given a data frame I am trying to cast from long-to-wide using the dcast.data.table function from library(data.table). However, when using large numeric's on the left side of the formula it some how combines.

Below is an example:

df <- structure(list(A = c(10000000007624, 10000000007619, 10000000007745, 
10000000007624, 10000000007767, 10000000007729, 10000000007705, 
10000000007711, 10000000007784, 10000000007745, 10000000007624, 
10000000007762, 10000000007762, 10000000007631, 10000000007762, 
10000000007619, 10000000007628, 10000000007705, 10000000007762, 
10000000007624, 10000000007745, 10000000007706, 10000000007767, 
10000000007777, 10000000007624, 10000000007745, 10000000007624, 
10000000007777, 10000000007771, 10000000007631, 10000000007624, 
10000000007640, 10000000007642, 10000000007708, 10000000007711, 
10000000007745, 10000000007767, 10000000007655, 10000000007722, 
10000000007745, 10000000007762, 10000000007771, 10000000007617
), B = c(4060697L, 7683673L, 7699192L, 1322422L, 7754939L, 7448486L, 
2188027L, 1061376L, 2095950L, 7793530L, 2095950L, 6447861L, 2188027L, 
7448451L, 7428427L, 7516354L, 7067801L, 2095950L, 6740142L, 405911L, 
4057215L, 1061345L, 7754945L, 7501748L, 2188027L, 7780980L, 6651988L, 
6649330L, 6655118L, 6556367L, 6463510L, 2347462L, 7675114L, 6556361L, 
1061345L, 7224099L, 6463515L, 2188027L, 6463515L, 7311234L, 7764971L, 
7224099L, 2347479L), C = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 
3L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 25L, 2L, 1L, 2L, 
1L, 1L, 1L)), .Names = c("A", "B", "C"), row.names = c(NA, -43L
), class = "data.frame")

df <- as.data.table(df)

output <- dcast.data.table(df, A ~ B, value.var = "C",
                           fun.aggregate = sum, fill = NA)

This will produce only 2 rows, 10000000007624 & 10000000007784 and everything will be summed up in just those two.

This error does not occur when using reshape2::dcast function, this method produces the correct result.

Is there a reason why dcast.data.table is producing this error?


回答1:


Issue was raised on github and responded by @jangorecki and this answer comes from the setNumericRounding help document.

when joining or grouping, data.table rounds such data to apx 11 s.f. which is plenty of digits for many cases. This is achieved by rounding the last 2 bytes off the significand.

As such my 14 digit large numeric's where getting rounded and therefore combined.

As @jangorecki mentions this can be avoided by setting setNumericRounding(0). However, I personally have re-classified my large numeric's to factors. This make more sense for my particular use case.

Further to this @jangorecki also advises use of bit64 package when dealing with large numeric's.

The original post on github.



来源:https://stackoverflow.com/questions/37941867/error-with-large-numerics-in-dcast-data-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!