Percentile for Each Observation w/r/t Grouping Variable

后端 未结 7 1368
广开言路
广开言路 2021-02-06 10:39

I have some data that looks like the following. It is grouped by variable \"Year\" and I want to extract the percentiles of each observation of Score, with respect to t

7条回答
  •  Happy的楠姐
    2021-02-06 11:16

    I may be misunderstanding, but I think it can be done this way:

    > years = c(2006, 2006, 2006, 2006, 2001, 2001, 2001, 2001, 2001)
    > scores = c(13, 65, 23, 34, 78, 56, 89, 98, 100)
    > tapply(scores, years, quantile)
    $`2001`
      0%  25%  50%  75% 100% 
      56   78   89   98  100 
    
    $`2006`
       0%   25%   50%   75%  100% 
    13.00 20.50 28.50 41.75 65.00 
    

    Is this right?

    I mean the actual percentile of each observation. – Ryan Rosario

    Edit:

    I think this may do it then:

    > tapply(scores, years, function(x) { f = ecdf(x); sapply(x, f) })
    $`2001`
    [1] 0.4 0.2 0.6 0.8 1.0
    
    $`2006`
    [1] 0.25 1.00 0.50 0.75
    

    With your data:

    > tapply(scores, years, function(x) { f = ecdf(x); sapply(x, f) })
    $`2000`
    [1] 0.3333333 0.6666667 1.0000000
    
    $`2008`
    [1] 0.5 1.0
    

    Edit 2:

    This is probably faster:

    tapply(scores, years, function(x) { f = ecdf(x); f(x) })
    

    f() is vectorized :-)

    Last, modification, I promise :-). If you want names:

    > tapply(scores, years, function(x) { f = ecdf(x); r = f(x); names(r) <- x; r })
    $`2000`
         1000      1700      2000 
    0.3333333 0.6666667 1.0000000 
    
    $`2008`
    1500 2000 
     0.5  1.0 
    

提交回复
热议问题