Grouped non-dense rank without omitted values

♀尐吖头ヾ 提交于 2021-01-28 19:08:23

问题


I have the following data.frame:

df <- data.frame(date = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
                 id   = c(4, 4, 2, 4, 1, 2, 3, 1, 2, 2, 1, 1))

And I want to add a new column grp which, for each date, ranks the IDs. Ties should have the same value, but there should be no omitted values. That is, if there are two values which are equally minimum, they should both get rank 1, and the next lowest values should get rank 2.

The expected result would therefore look like this. Note that, as mentioned, the groups are for each date, so the operation must be grouped by date.

data.frame(date = c(1, 1, 1, 1,     2, 2, 2, 2,     3, 3, 3, 3),
           id   = c(4, 4, 2, 4,     1, 2, 3, 1,     2, 2, 1, 1),
           grp  = c(2, 2, 1, 2,     1, 2, 3, 1,     2, 2, 1, 1))

I'm sure there's a trivial way to do this but I haven't found it: none of the options for tie.method behave in this way (data.table::frank also doesn't help, since it only adds a dense rank).

I thought of doing a normal rank and then using data.table::rleid, but that doesn't work if there are duplicate values separated by other values during the same day.

I also thought of grouping by date and id and then using a group-ID, but the lowest values each day must start at rank 1, so that won't work either.

The only functional solution I've found is to create another table with the unique ids per day and then join that table to this one:

suppressPackageStartupMessages(library(dplyr))

df <- data.frame(date = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
                 id   = c(4, 4, 2, 4, 1, 2, 3, 1, 2, 2, 1, 1))

uniques <- df %>%
  group_by(
    date
  ) %>%
  distinct(
    id
  ) %>%
  mutate(
    grp = rank(id)
  )

df <- df %>% left_join(
  unique
) %>% print()
#> Joining, by = c("date", "id")
#>    date id grp
#> 1     1  4   2
#> 2     1  4   2
#> 3     1  2   1
#> 4     1  4   2
#> 5     2  1   1
#> 6     2  2   2
#> 7     2  3   3
#> 8     2  1   1
#> 9     3  2   2
#> 10    3  2   2
#> 11    3  1   1
#> 12    3  1   1

Created on 2020-05-08 by the reprex package (v0.3.0)

However, this seems quite inelegant and convoluted for what seems like a simple operation, so I'd rather see if other solutions are available.

Curious to see data.table solutions if available, but unfortunately the solution must be in dplyr.


回答1:


We can use dense_rank

library(dplyr)
df %>%
   group_by(date) %>%
   mutate(grp = dense_rank(id))
# A tibble: 12 x 3
# Groups:   date [3]
#   date    id   grp
#   <dbl> <dbl> <int>
# 1     1     4     2
# 2     1     4     2
# 3     1     2     1
# 4     1     4     2
# 5     2     1     1
# 6     2     2     2
# 7     2     3     3
# 8     2     1     1
# 9     3     2     2
#10     3     2     2
#11     3     1     1
#12     3     1     1

Or with frank

library(data.table)
setDT(df)[, grp := frank(id, ties.method = 'dense'), date]


来源:https://stackoverflow.com/questions/61690226/grouped-non-dense-rank-without-omitted-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!