Explain R tapply description

前端 未结 2 841
借酒劲吻你
借酒劲吻你 2021-01-31 05:04

I understand what tapply() does in R. However, I cannot parse this description of it from the documentaion:


Apply a Function Over a \"Ragged\" Array

Description:

         


        
2条回答
  •  忘了有多久
    2021-01-31 05:29

    Let's see what the R documentation says on the subject:

    The combination of a vector and a labelling factor is an example of what is sometimes called a ragged array, since the subclass sizes are possibly irregular. When the subclass sizes are all the same the indexing may be done implicitly and much more efficiently, as we see in the next section.

    The list of factors you supply via INDEX together specify a collection of subsets of X, of possibly different lengths (hence, the 'ragged' descriptor). And then FUN is applied to each subset.

    EDIT: @Joris makes an excellent point in the comments. It may be helpful to think of tapply(X,Y,...) as a wrapper for sapply(split(X,Y),...) in that if Y is a list of grouping factors, it builds a new, single grouping factor based on their unique levels, splits X accordingly and applies FUN to each piece.

    EDIT: Here's an illustrative example:

    library(lattice)
    library(plyr)
    set.seed(123)
    
    #Make this example unbalanced
    dat <- barley[sample(1:120,50),]
    
    #Suppose we want the avg yield by year/site:
    table(dat$year,dat$site)
    
    #That's what they mean by 'ragged' array; there are different
    # numbers of obs at each comb of levels
    
    #In plyr we could use ddply:
    ddply(dat,.(year,site),.fun=function(x){mean(x$yield)})
    
    #Which gives the same result (listed in a diff order) as:
    melt(tapply (dat$yield, list (dat$year, dat$site), mean))
    

提交回复
热议问题