How to partition when ranking on a particular column?

问题

All:

I have a data frame like the follow.I know I can do a global rank order like this:

dt <- data.frame(
    ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
    Value = c(4,3,1,3,4,6,6,1,8,4)
);
> dt
   ID Value
1  A1     4
2  A2     3
3  A4     1
4  A2     3
5  A1     4
6  A4     6
7  A3     6
8  A2     1
9  A1     8
10 A3     4
dt$Order <- rank(dt$Value,ties.method= "first")
> dt
   ID Value Order
1  A1     4     5
2  A2     3     3
3  A4     1     1
4  A2     3     4
5  A1     4     6
6  A4     6     8
7  A3     6     9
8  A2     1     2
9  A1     8    10
10 A3     4     7

But how can I set a rank order for a particular ID instead of a global rank order. How can I get this done? In T-SQL, we can get this done as the following syntax:

RANK() OVER ( [ < partition_by_clause > ] < order_by_clause > )

Any idea?

回答1:

My way but there's likely better. Never used rank, din't even know about it. Thanks, may be useful.

#Your Data
dt <- data.frame(
    ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
    Value = c(4,3,1,3,4,6,6,1,8,4)
)
dt$Order <- rank(dt$Value,ties.method= "first")

#My approach
dt$id <- 1:nrow(dt) #needed for ordering and putting things back together
dt <- dt[order(dt$ID),]
dt$Order.by.group <- unlist(with(dt, tapply(Value, ID, function(x) rank(x, 
    ties.method = "first"))))
dt[order(dt$id), -4]

Yields:

   ID Value Order Order.by.group
1  A1     4     5              1
2  A2     3     3              2
3  A4     1     1              1
4  A2     3     4              3
5  A1     4     6              2
6  A4     6     8              2
7  A3     6     9              2
8  A2     1     2              1
9  A1     8    10              3
10 A3     4     7              1

EDIT:

If you don't care about preserving the original order of the data then this works with less code:

dt <- dt[order(dt$ID),]
dt$Order.by.group <- unlist(with(dt, tapply(Value, ID, function(x) rank(x, 
   ties.method= "first"))))

   ID Value Order.by.group
1  A1     4              1
5  A1     4              2
9  A1     8              3
2  A2     3              2
4  A2     3              3
8  A2     1              1
7  A3     6              2
10 A3     4              1
3  A4     1              1
6  A4     6              2

回答2:

Many options.

Using ddply from the plyr package:

library(plyr)
ddply(dt,.(ID),transform,Order = rank(Value,ties.method = "first"))
   ID Value Order
1  A1     4     1
2  A1     4     2
3  A1     8     3
4  A2     3     2
5  A2     3     3
6  A2     1     1
7  A3     6     2
8  A3     4     1
9  A4     1     1
10 A4     6     2

Or if performance is an issue (i.e. very large data) using the data.table package:

library(data.table)
DT <- data.table(dt,key = "ID")
DT[,transform(.SD,Order = rank(Value,ties.method = "first")),by = ID]
      ID Value Order
 [1,] A1     4     1
 [2,] A1     4     2
 [3,] A1     8     3
 [4,] A2     3     2
 [5,] A2     3     3
 [6,] A2     1     1
 [7,] A4     1     1
 [8,] A4     6     2
 [9,] A3     6     2
[10,] A3     4     1

or in all its gory detail a base R solution using split lapply do.call and rbind:

do.call(rbind,lapply(split(dt,dt$ID),transform,
              Order = rank(Value,ties.method = "first")))

回答3:

Here are a couple of approaches:

ave This takes each set of Value numbers that have the same ID and applies rank separately to each such set. No packages are used.

Rank <- function(x) rank(x, ties.method = "first")
transform(dt, rank = ave(Value, ID, FUN = Rank))

giving:

   ID Value rank
1  A1     4    1
2  A2     3    2
3  A4     1    1
4  A2     3    3
5  A1     4    2
6  A4     6    2
7  A3     6    2
8  A2     1    1
9  A1     8    3
10 A3     4    1

Note that the above solution keeps the original row order. It could be sorted afterwards if that were desired.

sqldf with RPostgreSQL

# see FAQ #12 on the sqldf github home page for info on sqldf and PostgreSQL
# https://cran.r-project.org/web/packages/sqldf/README.html

library(RPostgreSQL)
library(sqldf)

sqldf('select 
          *, 
          rank() over (partition by "ID" order by "Value") rank 
       from "dt"
')

This solution reorders the rows. It is assumed that that is ok since your example solution did that (but if not append a sequence number column to dt and add an appropriate order by clause to re-order the result back into the sequence number order).

回答4:

You can use the data.table package.

setDT(dt) dt[, Order := rank(Value, ties.method = "first"), by = "ID"] dt <- as.data.frame(dt)

giving the desired output:

   ID Value Order
1  A1     4     1
2  A2     3     2
3  A4     1     1
4  A2     3     3
5  A1     4     2
6  A4     6     2
7  A3     6     2
8  A2     1     1
9  A1     8     3
10 A3     4     1

来源：https://stackoverflow.com/questions/9961700/how-to-partition-when-ranking-on-a-particular-column

标签

dataframe

rank

database-partitioning