问题
I have the following data.frame:
df <- data.frame(date = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
id = c(4, 4, 2, 4, 1, 2, 3, 1, 2, 2, 1, 1))
And I want to add a new column grp
which, for each date, ranks the IDs. Ties should have the same value, but there should be no omitted values. That is, if there are two values which are equally minimum, they should both get rank 1, and the next lowest values should get rank 2.
The expected result would therefore look like this. Note that, as mentioned, the groups are for each date, so the operation must be grouped by date.
data.frame(date = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
id = c(4, 4, 2, 4, 1, 2, 3, 1, 2, 2, 1, 1),
grp = c(2, 2, 1, 2, 1, 2, 3, 1, 2, 2, 1, 1))
I'm sure there's a trivial way to do this but I haven't found it: none of the options for tie.method
behave in this way (data.table::frank
also doesn't help, since it only adds a dense rank).
I thought of doing a normal rank and then using data.table::rleid
, but that doesn't work if there are duplicate values separated by other values during the same day.
I also thought of grouping by date
and id
and then using a group-ID, but the lowest values each day must start at rank 1, so that won't work either.
The only functional solution I've found is to create another table with the unique ids
per day and then join that table to this one:
suppressPackageStartupMessages(library(dplyr))
df <- data.frame(date = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
id = c(4, 4, 2, 4, 1, 2, 3, 1, 2, 2, 1, 1))
uniques <- df %>%
group_by(
date
) %>%
distinct(
id
) %>%
mutate(
grp = rank(id)
)
df <- df %>% left_join(
unique
) %>% print()
#> Joining, by = c("date", "id")
#> date id grp
#> 1 1 4 2
#> 2 1 4 2
#> 3 1 2 1
#> 4 1 4 2
#> 5 2 1 1
#> 6 2 2 2
#> 7 2 3 3
#> 8 2 1 1
#> 9 3 2 2
#> 10 3 2 2
#> 11 3 1 1
#> 12 3 1 1
Created on 2020-05-08 by the reprex package (v0.3.0)
However, this seems quite inelegant and convoluted for what seems like a simple operation, so I'd rather see if other solutions are available.
Curious to see data.table
solutions if available, but unfortunately the solution must be in dplyr
.
回答1:
We can use dense_rank
library(dplyr)
df %>%
group_by(date) %>%
mutate(grp = dense_rank(id))
# A tibble: 12 x 3
# Groups: date [3]
# date id grp
# <dbl> <dbl> <int>
# 1 1 4 2
# 2 1 4 2
# 3 1 2 1
# 4 1 4 2
# 5 2 1 1
# 6 2 2 2
# 7 2 3 3
# 8 2 1 1
# 9 3 2 2
#10 3 2 2
#11 3 1 1
#12 3 1 1
Or with frank
library(data.table)
setDT(df)[, grp := frank(id, ties.method = 'dense'), date]
来源:https://stackoverflow.com/questions/61690226/grouped-non-dense-rank-without-omitted-values