R data.table weird value/reference semantics

后端 未结 3 777
野性不改
野性不改 2021-02-19 04:35

(This is a follow up question to this.)

Check this toy code:

> x <- data.frame(a = 1:2)
> foo <- function(z) { setDT(z) ; z[, b:=3:4] ; z } 
&g         


        
3条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-19 05:30

    In your function z is a reference to x until setDT.

    library(data.table)
    foo <- function(z) {print(address(z)); setDT(z); print(address(z))} 
    x <- data.frame(a = 1:2)
    address(x)
    #[1] "0x555ec9a471e8"
    foo(x)
    #[1] "0x555ec9a471e8"
    #[1] "0x555ec9ede300"
    

    In setDT it comes to the following line where z is still pointing to the same address like x:

    setattr(z, "class", data.table:::.resetclass(z, "data.frame"))
    

    setattr does not make a copy. So x and z are still pointing to the same address and both are now of class data.frame:

    x <- data.frame(a = 1:2)
    z <- x
    class(x)
    #[1] "data.frame"
    address(x)
    #[1] "0x555ec95de600"
    address(z)
    #[1] "0x555ec95de600"
    
    setattr(z, "class", data.table:::.resetclass(z, "data.frame"))
    
    class(x)
    #[1] "data.table" "data.frame"
    address(x)
    #[1] "0x555ec95de600"
    address(z)
    #[1] "0x555ec95de600"
    

    Then setalloccol is called which calls in this case:

    assign("z", .Call(data.table:::Calloccolwrapper, z, 1024, FALSE))
    

    which now let x and z point to different addresses.

    address(x)
    #[1] "0x555ecaa09c00"
    address(z)
    #[1] "0x555ec95de600"
    

    And both have the class data.frame

    class(x)
    #[1] "data.table" "data.frame"
    class(z)
    #[1] "data.table" "data.frame"
    

    I think when they would have used

    class(z) <- data.table:::.resetclass(z, "data.frame")
    

    instead of

    setattr(z, "class", data.table:::.resetclass(z, "data.frame"))
    

    the problem would not occur.

    x <- data.frame(a = 1:2)
    z <- x
    address(x)
    #[1] "0x555ec9cd2228"
    class(z) <- data.table:::.resetclass(z, "data.frame")
    class(x)
    #[1] "data.frame"
    class(z)
    #[1] "data.table" "data.frame"
    address(x)
    #[1] "0x555ec9cd2228"
    address(z)
    #[1] "0x555ec9cd65a8"
    

    but after class(z) <- value z will not point to the same address where it points before:

    z <- data.frame(a = 1:2)
    address(z)
    #[1] "0x5653dbe72b68"
    address(z$a)
    #[1] "0x5653db82e140"
    class(z) <- c("data.table", "data.frame")
    address(z)
    #[1] "0x5653dbe82d98"
    address(z$a)
    #[1] "0x5653db82e140"
    

    but after setDT it will also not point to the same address where it points before:

    z <- data.frame(a = 1:2)
    address(z)
    #[1] "0x55b6f04d0db8"
    setDT(z)
    address(z)
    #[1] "0x55b6efe1e0e0"
    

    As @Matt-dowle pointed out, it is also possible to change the data in x over z:

    x <- data.frame(a = c(1,3))
    z <- x
    setDT(z)
    z[, b:=3:4]
    z[2, a:=7]
    z
    #   a b
    #1: 1 3
    #2: 7 4
    x
    #   a
    #1: 1
    #2: 7
    
    R.version.string
    #[1] "R version 4.0.2 (2020-06-22)"
    packageVersion("data.table")
    #[1] ‘1.12.8’
    

提交回复
热议问题