how to create a list in R from two vectors (one would be the keys, the other the values)?

前端 未结 5 1120
离开以前
离开以前 2020-12-29 23:03

I have two vectors and I want to create a list in R where one vector are the keys and the other the values. I thought that I was going to find easily the answer in my books

相关标签:
5条回答
  • 2020-12-29 23:26

    It can be done in one statement using setNames:

    xx <- 1:3
    yy <- letters[1:3]
    

    To create a named list:

    as.list(setNames(xx, yy))
    # $a
    # [1] 1
    # 
    # $b
    # [1] 2
    # 
    # $c
    # [1] 3
    

    Or a named vector:

    setNames(xx, yy)
    # a b c 
    # 1 2 3
    

    In the case of the list, this is programmatically equivalent to your "named vector" approach but maybe a little more elegant.


    Here are some benchmarks that show the two approaches are just as fast. Also note that the order of operations is very important in avoiding an unnecessary and costly copy of the data:

    f1 <- function(xx, yy) {
      names(xx) <- yy
      as.list(xx)
    }
    
    f2 <- function(xx, yy) {
      out <- as.list(xx)
      names(out) <- yy
      out
    }
    
    f3 <- function(xx, yy) as.list(setNames(xx, yy))
    f4 <- function(xx, yy) setNames(as.list(xx), yy)
    
    library(microbenchmark)
    microbenchmark(
      f1(xx, yy),
      f2(xx, yy),
      f3(xx, yy),
      f4(xx, yy)
    )
    # Unit: microseconds
    #        expr    min      lq  median      uq     max neval
    #  f1(xx, yy) 41.207 42.6390 43.2885 45.7340 114.853   100
    #  f2(xx, yy) 39.187 40.3525 41.5330 43.7435 107.130   100
    #  f3(xx, yy) 39.280 41.2900 42.1450 43.8085 109.017   100
    #  f4(xx, yy) 76.278 78.1340 79.1450 80.7525 180.825   100
    
    0 讨论(0)
  • 2020-12-29 23:27

    Do you mean to do this?...

    xx <- 1:3
    yy <- letters[1:3]
    zz <- list( xx , yy )
    names(zz) <- c("keys" , "values")
    zz
    #$keys
    #[1] 1 2 3
    
    #$values
    #[1] "a" "b" "c"
    

    AFAIK this is the canonical way of making a list of vectors. I am happy to be corrected. If you are new to R, I'd advise it is generally unwise to use a for loop because there are usually vectorised methods to accomplish most tasks that are more efficient and faster.

    0 讨论(0)
  • 2020-12-29 23:31

    If your values are all scalars, then there's nothing wrong with having a "key-value store" that's just a vector.

    vals <- 1:1000000
    keys <- paste0("key", 1:1000000)
    names(vals) <- keys
    

    You can then retrieve the value corresponding to a given key with

    vals["key42"]
    [1] 42
    

    IIRC R uses hashing for character-based indexing, so lookups should be fast regardless of the size of your vector.

    If your values can be arbitrary objects, then you do need a list.

    vals <- list(1:100, lm(speed ~ dist, data=cars), function(x) x^2)
    names(vals) <- c("numbers", "model", "function")
    
    sq <- vals[["function"]]
    sq(5)
    [1] 25
    

    If your question is about constructing the list, I wouldn't be too worried. R internally is copy-on-write (objects are only copied if their contents are modified), so doing something like

    vals <- list(1:1000000, 1:1000000, <other big objects>)
    

    will not actually make extra copies of everything.

    Edit: I just checked, and R will copy everything if you do lst <- list(....). Go figure. So if you're already close to the memory limit on your machine, this won't work. On the other hand, if you do names(lst) <- ...., it won't make another copy of lst. Go figure again.

    0 讨论(0)
  • 2020-12-29 23:43

    Another serious option here , is to use data.table. Which use the key to sort your structure and it is very fast to access elements specially when you have a large numbers . Here an example:

    library(data.table)   
    DT <- data.table(xx = 1:1e6, 
                 k = paste0("key", 1:1e6),key="k")
    

    Dt is a data.table with 2 columns , where I set the column k as a key. DT xx k 1: 1 key1 2: 10 key10 3: 100 key100 4: 1000 key1000 5: 10000 key10000 ---
    999996: 999995 key999995 999997: 999996 key999996 999998: 999997 key999997 999999: 999998 key999998 1000000: 999999 key999999

    Now I can access my data.table using the key like this:

    DT['key1000']
             k   xx
    1: key1000 1000
    

    Here a benchmarking comparing the data.table solution to a named vector:

    vals <- 1:1000000
    DT <- data.table(xx = vals ,
                     k = paste0("key", vals),key="k")
    keys <- paste0("key", vals)
    names(vals) <- keys
    library(microbenchmark)
    microbenchmark( vals["key42"],DT["key42"],times=100)
    
    Unit: microseconds
              expr        min          lq     median         uq        max neval
     vals["key42"] 111938.692 113207.4945 114924.010 130010.832 361077.210   100
       DT["key42"]    768.753    797.0085   1055.661   1067.987   2058.985   100
    
    0 讨论(0)
  • 2020-12-29 23:44

    Hong's output is wrong.

    Should use vals[["key42"]]

    > vals[["key42"]]
    [1] 42
    
    vals <- 1:1000000
    keys <- paste0("key", 1:1000000)
    names(vals) <- keys
    
    vals["key42"]
    key42
       42
    
    0 讨论(0)
提交回复
热议问题