how to avoid an optimization warning in data.table

后端 未结 2 629
萌比男神i
萌比男神i 2021-01-14 08:47

I have the following code:

> dt <- data.table(a=c(rep(3,5),rep(4,5)),b=1:10,c=11:20,d=21:30,key=\"a\")
> dt
    a  b  c  d
 1: 3  1 11 21
 2: 3  2 1         


        
2条回答
  •  南方客
    南方客 (楼主)
    2021-01-14 09:09

    One way I could think of is to assign count by reference:

    dt.out <- dt[, lapply(.SD,sum), by = a]
    dt.out[, count := dt[, .N, by=a][, N]]
    # alternatively: count := table(dt$a)
    
    #    a  b  c   d count
    # 1: 3 15 65 115     5
    # 2: 4 40 90 140     5
    

    Edit 1: I still think it's just message and not a warning. But if you still want to avoid that, just do:

    dt.out[, count := as.numeric(dt[, .N, by=a][, N])]
    

    Edit 2: Very interesting. Doing the equivalent of multiple := assignment does not produce the same message.

    dt.out[, `:=`(count = dt[, .N, by=a][, N])]
    # Detected that j uses these columns: a 
    # Finding groups (bysameorder=TRUE) ... done in 0.001secs. bysameorder=TRUE and o__ is length 0
    # Detected that j uses these columns:  
    # Optimization is on but j left unchanged as '.N'
    # Starting dogroups ... done dogroups in 0 secs
    # Detected that j uses these columns: N 
    # Assigning to all 2 rows
    # Direct plonk of unnamed RHS, no copy.
    

提交回复
热议问题