Add a column of ranks

核能气质少年 提交于 2019-12-10 13:26:31

问题


I have some data:

test <- data.frame(A=c("aaabbb",
"aaaabb",
"aaaabb",
"aaaaab",
"bbbaaa")
)

and so on. All the elements are the same length, and are already sorted before I get them.

I need to make a new column of ranks, "First", "Second", "Third", anything after that can be left blank, and it needs to account for ties. So in the above case, I'd like to get the following output:

   A       B
 aaabbb  First
 aaaabb  Second
 aaaabb  Second
 aaaaab  Third
 bbbaaa
 bbbbaa  

I looked at rank() and some other posts that used it, but I wasn't able to get it to do what I was looking for.


回答1:


How about this:

test$B <- match(test$A , unique(test$A)[1:3] )
test
       A  B
1 aaabbb  1
2 aaaabb  2
3 aaaabb  2
4 aaaaab  3
5 bbbaaa NA
6 bbbbaa NA

One of many ways to do this. Possibly not the best, but one that readily springs to mind and is fairly intuitive. You can use unique because you receive the data pre-sorted.

As data is sorted another suitable function worth considering is rle, although it's slightly more obtuse in this example:

rnk <- rle(as.integer(df$A))$lengths
rnk
# [1] 1 2 1 1 1
test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )

rle computes the lengths (and values which we don't really care about here) of runs of equal values in a vector - so again this works because your data are already sorted.

And if you don't have to have blanks after the third ranked item it's even simpler (and more readable):

test$B <- rep(1:length(rnk),times=rnk)



回答2:


This seems like a good application for factors:

test$B <- as.numeric(factor(test$A, levels = unique(test$A)))

cumsum also comes to mind, where we add 1 every time the value changes:

test$B <- cumsum(c(TRUE, tail(test$A, -1) != head(test$A, -1)))

(Like @Simon said, there are many ways to do this...)



来源:https://stackoverflow.com/questions/17098173/add-a-column-of-ranks

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!