What is the difference between a list and a pairlist in R?

后端 未结 2 968
死守一世寂寞
死守一世寂寞 2020-12-13 23:49

In reading the documentation for lists, I found references to pairlists, but it wasn\'t clear to me how they were different from lists.

2条回答
  •  青春惊慌失措
    2020-12-14 00:34

    Pairlists in day to day R

    There are two places that pair lists will show up commonly in day to day R. One is as function formals:

    str(formals(var))
    

    The other is as language objects. For example:

    quote(1 + 1)
    

    produces a pairlist of type language (LANGSXP internally). The principal reason why you would even care about being aware of this is that operations such as length() or language_object[[x]] can be slow because of how pairlist are stored internally (though long pairlist language objects are somewhat rare; note expressions are not pairlists).

    Note that empty elements are just zero length symbols, and you can actually store them in lists if you cheat a bit (though you probably shouldn't do this):

    list(x=substitute(x, alist(x=)))  # hack alert
    

    All that said, for the most part, OP is correct that you don't need to worry about pairlists too much unless you are writing C code for use in R.

    Internal differences between lists and pairlists

    Pairlists and list are different principally in their storage structure. Pairlists are stored as a chain of nodes, where each node points to the location of the next node in addition to the node's contents and the node's "name" (See CAR/CDR wiki article for generic discussion). Among other things this means you can't know how many elements there are in a pairlist unless you know what element is the first one, and you then traverse the entire list.

    Pairlists are used extensively in the R internals, and do exist in normal R use, but most of the time are disguised by the print or access methods and/or coerced to lists when accessed.

    Lists are also a list of addresses, but unlike pairlists, all the addresses are stored in one contiguous memory location and the total length is tracked. This makes it easy to access any arbitrary member of the list by location since you can just look up the address in the memory table. With a pairlist, you would have to jump from node to node until you eventually got to the desired node. Names are also stored as attributes of the list proper, instead of being attached to each node of a pairlist.

    Benefits of pairlists

    One (generally small) benefit of pairlists is that you can add to them with minimal overhead since you only need modify at most two nodes (the node ahead of the new node, and the new node itself), whereas with a list you may need to re-allocate the entire address table with an increase in size (this is typically not much of an issue since the address table is usually very small compared to the size of the data the table points to). There are also many algorithms that specialize in pairlist manipulation (e.g. sorting, indexing, etc.), but those can be ported to normal lists as well.

    Less relevant for day-to-day use since you can only do this in internals, it is very easy to modify list from a programming perspective by changing what any arbitrary element points to.

    Loosely related to the above, pairlists are likely be more efficient when you have highly nested objects. lists can easily replicate this structure, but each list and nested list will be saddled with the extra memory address table. This is likely the reason pairlists are used for language objects that very likely have a high nesting / element ratio.

    For more details see R Internals (look for LISTSXP and VECSXP, pairlists and lists respectively, in the linked location).

    edit: interestingly an experiment to compare the memory footprint of a list to a pairlist shows the pairlist to be larger, so the storage efficiency argument may be incorrect (not sure if object.size can be trusted here):

    > plist_to_list <- function(x) {
    +   if(is.call(x)) x <- as.list(x)
    +   if(length(x) > 1) for(i in 2:length(x)) x[[i]] <- Recall(x[[i]])
    +   x
    + }
    > add_quote <- function(x, y) call("+", x, y)
    > x <- Reduce(add_quote, lapply(letters, as.name))
    > object.size(x)
    7056 bytes
    > y <- plist_to_list(x)
    > object.size(y)
    4656 bytes
    

提交回复
热议问题