assigning sequential ranks to data with multiple ties in R

◇◆丶佛笑我妖孽 提交于 2020-01-05 07:33:45

问题


I am trying to rank counts, conditioned by two factors in a dataframe. However I would like to have a special treatment of the ties. If two values are equaly, I want them to have an equal tie value. However the next value in the rank should have the next integer rank.

Where I'm stuck is when I have the get the dataframe of unique values, conditional on the factor species. (In my actual data set it is conditional on three factors).

species <- c(rep("a", 3), rep("b", 4))
df <- data.frame(species, count = c("1", "1", "5", "1", "3", "3", "4"))

df$rank <- ave(df$count, df$species, FUN = rank)#doesnt get the output i'd like

#desired output
df$rank.good <- c("1", "1", "2", "1", "2", "2", "3")
df

回答1:


With your data in its current form you have two problems, one of which is an R syntactic concern and the other is a "semantic" concern. The syntactic concern has been raised by @ARobertson who is really suggesting that you convert the "count" column to character. That will prevent the creation of spurious <NA>'s but won't solve the semantic problem of what to do if this is more than just a toy problem. If those count values are coming in as character values then sorting as characters will make the ordering: 1,10,11,12,...,19,2,20,21, .... So immediately after converting factors with as.character, you also need an as.numeric step, even if you resort to using dplyr::dense_rank:

dense_rank <-   # copied from pkg::dplyr
 function (x) 
 {   r <- rank(x)
     match(r, sort(unique(r)))
 }
df$rank.good <- ave(as.numeric(as.character(df$count)), df$species, FUN = dense_rank)

If you really want these to be character class you can wrap an outer as.character(.) around the ave function-call.




回答2:


Try this:

# added more tests that are not sequential and fixed up data.frame
species <- c(rep("a", 3), rep("b", 4),rep("c",10))
df <- data.frame(species, count = c("1", "1", "5", "1", "3", "3", "4",'1','7','3','3','7','2','10','3','11','2'),stringsAsFactors = F)
df$count <- as.numeric(df$count)

# solution
df$rank <- ave(df$count, df$species, FUN = function(x){
  r <- rank(x,ties.method = 'min')
  as.numeric(factor(rank(sort(r))))[r]
  })


来源:https://stackoverflow.com/questions/26831717/assigning-sequential-ranks-to-data-with-multiple-ties-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!