Behavior of <- NULL on lists versus data.frames for removing data

前端 未结 1 1443
长情又很酷
长情又很酷 2020-12-05 19:52

Many R users eventually figure out lots of ways to remove elements from their data. One way is to use NULL, particularly when you want to do something like drop

相关标签:
1条回答
  • 2020-12-05 20:17

    DISCLAIMER : This is a relatively long answer, not very clear, and not very interesting, so feel free to skip it or to only read the (sort of) conclusion.

    I've tried a bit of tracing on [<-.data.frame, as suggested by Ari B. Friedman. Debugging starts on line 162 of the function, where there is a test to determine if value (the replacement value argument) is not a list.

    Case 1 : value is not a list

    Then it is considered as a vector. Matrices and arrays are considered as one vector, like the help page says :

    Note that when the replacement value is an array (including a matrix) it is not treated as a series of columns (as 'data.frame’ and ‘as.data.frame’ do) but inserted as a single column.

    If only one column of the data frame is selected in the LHS, then the only constraint is that the number of rows to be replaced must be equal to or a multiple of length(value). If this is the case, value is recycled with rep if necessary and converted to a list. If length(value)==0, there is no recycling (as it is impossible), and value is just converted to a list.

    If several columns of the data frame are selected in the LHS, then the constraint is a bit more complex : length(value) must be equal to or a multiple of the total number of elements to be replaced, ie the number of rows * the number of columns.

    The exact test is the following :

    (m < n * p && (m == 0L || (n * p)%%m))
    

    Where n is the number of rows, p the number of columns, and m the length of value. If the condition is FALSE, then value is converted into an n x p matrix (thus recycled if necessary) and the matrix is splitted by columns into a list.

    If value is NULL, then the condition is TRUE as m==0, and the function is stopped. Note that the problem occurs for every value of length 0. For example,

    cars1[,c("mpg")] <- numeric(0)
    

    works, whereas :

    cars1[,c("mpg","disp")] <- numeric(0)
    

    fails in the same way as cars1[,c("mpg","disp")] <- NULL

    Case 2 : value is a list

    If value is a list, then it is used to replace several columns at the same time. For example :

    cars1[,c("mpg","disp")] <- list(1,2)
    

    will replace cars1$mpg with a vector of 1s, and cars1$disp with a vector of 2s.

    There is a sort of "double recycling" which happens here :

    • first, the length of the value list must be less than or equal to the number of columns to be replaced. If it is less, then a classic recycling is done.
    • second, for each element of the value list, its length must be equal to, greater than or a multiple of the number of rows to be replaced. If it is less, another recycling is done for each list element to match the number of rows. If it is more, a warning is displayed.

    When the value in RHS is list(NULL), nothing really happens, as recycling is impossible (rep(NULL, 10) is always NULL). But the code continues and in the end each column to be replaced is assigned NULL, ie is removed.

    Summary and (sort of) conclusion

    data.frame and list behave differently because of the specific constraint on data frames, where each element must be of the same length. Removing several columns by assigning NULL fails not because of the NULL value by itself, but because NULL is of length 0. The error comes from a test which verifies if the length of the assigned value is a multiple of the number of elements to be replaced (number of rows * number of columns).

    Handling the case of value=NULL for multiple columns doesn't seem difficult (by adding about four lines of simple code), but it requires to consider NULL as a special case. I'm not able to determine if it is not handled because it would break the logic of the function implementation, or because it would have side effects I don't know.

    0 讨论(0)
提交回复
热议问题