How to number/label data-table by group-number from group_by?

做~自己de王妃 提交于 2019-11-26 09:02:43
Randy Lai

Updated answer

get_group_number = function(){
    i = 0
    function(){
        i <<- i+1
        i
    }
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())

You can also consider the following slightly unreadable version

group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())

using iterators package

library(iterators)

counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))

dplyr has a group_indices() function that you can use like this:

df %>% 
    mutate(label = group_indices(., u, v)) %>% 
    group_by(label) ...
Rentrop

Another approach using data.table would be

require(data.table)
setDT(df)[,label:=.GRP, by = c("u", "v")]

which results in:

    u v label
 1: 2 1     1
 2: 1 3     2
 3: 2 1     1
 4: 3 4     3
 5: 3 1     4
 6: 1 1     5
 7: 3 2     6
 8: 2 3     7
 9: 3 2     6
10: 3 4     3
smci

Updating my answer with three different ways:

A) A neat non-dplyr solution using interaction(u,v):

> df$label <- factor(interaction(df$u,df$v, drop=T))
 [1] 1.3 2.3 2.2 2.4 3.2 2.4 1.2 1.2 2.1 2.1
 Levels: 2.1 1.2 2.2 3.2 1.3 2.3 2.4

> match(df$label, levels(df$label)[ rank(unique(df$label)) ] )
 [1] 1 2 3 4 5 4 6 6 7 7

B) Making Randy's neat fast-and-dirty generator-function answer more compact:

get_next_integer = function(){
  i = 0
  function(u,v){ i <<- i+1 }
}
get_integer = get_next_integer() 

df %>% group_by(u,v) %>% mutate(label = get_integer())

C) Also here is a one-liner using a generator function abusing a global variable assignment from this:

i <- 0
generate_integer <- function() { return(assign('i', i+1, envir = .GlobalEnv)) }

df %>% group_by(u,v) %>% mutate(label = generate_integer())

rm(i)

I don't have enough reputation for a comment, so I'm posting an answer instead.

The solution using factor() is a good one, but it has the disadvantage that group numbers are assigned after factor() alphabetizes its levels. The same behaviour happens with dplyr's group_indices(). Perhaps you would like the group numbers to be assigned from 1 to n based on the current group order. In which case, you can use:

my_tibble %>% mutate(group_num = as.integer(factor(group_var, levels = unique(.$group_var))) )
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!