R data.table NA type consistency

痴心易碎 提交于 2019-12-04 05:30:50

问题


dt = data.table(x = c(1,1,2,2,2,2,3,3,3,3))
dt[, y := if(.N > 2) .N else NA, by = x] # fail
dt[, y := if(.N > 2) .N else NA_integer_, by = x] # good

This first grouping fails because NA has a type and it's not integer. Is there a way to tell data table to ignore that and try to make all NAs to whatever type that keeps consistency?

I can manually set NA_integer here, but if I have lots of columns of different types, it's hard to set all NA type correct.

BTW, what NA type should I use for Date/IDate/ITime?


回答1:


OP's first question: Is there a way to tell data table to ignore that and try to make all NAs to whatever type that keeps consistency?

No. You'll see a similar error without the assignment:

dt[, if(.N > 2) .N else NA, by = x]
#  Error in `[.data.table`(dt, , if (.N > 2) .N else NA, by = x) : 
# Column 1 of result for group 2 is type 'integer' but expecting type 'logical'. Column types must be consistent for each group.

In my opinion, this "Column types must be consistent for each group." message should be shown for your case as well.


OP's second question: BTW, what NA type should I use for Date/IDate/ITime?

For IDate et al, I always subset by NA_integer_, which seems to give a length-one NA slice, e.g., as.IDate(Sys.Date())[NA_integer_]. I don't know if that's what one should do, but I don't know of a better idea. An illustration:

z = IDateTime(factor(Sys.time()))
#         idate    itime
# 1: 2016-08-01 16:05:25

str( lapply(z, function(x) x[NA_integer_]) )
# List of 2
#  $ idate: IDate[1:1], format: NA
#  $ itime:Class 'ITime'  int NA


来源:https://stackoverflow.com/questions/38703518/r-data-table-na-type-consistency

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!