How to reorder arbitrary integer vector to be in increasing order

守給你的承諾、 提交于 2019-12-25 06:48:02

问题


This question is a follow-up of this question.

Let's say I have a large data.frame, df, with columns u, v. I'd like to number the observed variable-interactions of u, v in increasing order, i.e. the order in which they were seen when traversing the data.frame from top to bottom.

Note: Assume df has some existing ordering so it's not ok to temporarily reorder it.

The code shown at the bottom of this post works well, except that the result vector returned is not in increasing order. That is, instead of the current:

# result is in decreasing order here:
match(df$label, levels(df$label))
# [1] 5 6 3 7 4 7 2 2 1 1

# but we'd like it to be in increasing order like this:
# 1 2 3 4 5 4 6 6 7 7

I've been experimenting with order(), rank(), factor(...ordered=T) etc. and nothing seems to work. I must be overlooking something obvious. Any ideas?

Note: It's also not allowed to cheat by reordering both u, v as individual factors.

set.seed(1234)
df <- data.frame(u=sample.int(3,10,replace=T), v=sample.int(4,10,replace=T))
#    u v
# 1  1 3
# 2  2 3
# 3  2 2
# 4  2 4
# 5  3 2
# 6  2 4
# 7  1 2
# 8  1 2
# 9  2 1
# 10 2 1

(df$label <- factor(interaction(df$u,df$v), ordered=T))
#  [1] 1.3 2.3 2.2 2.4 3.2 2.4 1.2 1.2 2.1 2.1
# Levels: 2.1 < 1.2 < 2.2 < 3.2 < 1.3 < 2.3 < 2.4

# This is ok except want increasing-order
match(df$label, levels(df$label))
# [1] 5 6 3 7 4 7 2 2 1 1

# no better.    
match(df$label, levels(df$label)[rank(levels(df$label))])
# [1] 6 7 1 4 3 4 5 5 2 2

回答1:


Duh! The solution is to add interaction(... drop=T). I still don't fully understand why not having that breaks things though.

# The original factor from interaction() had unused levels...
str(df$label)
# Factor w/ 12 levels "1.1","1.2","1.3",..: 3 7 6 8 10 8 2 2 5 5

# SOLUTION
df$label <- interaction(df$u,df$v, drop=T)

str(df$label)
# Factor w/ 7 levels "2.1","1.2","2.2",..: 5 6 3 7 4 7 2 2 1 1

rank(unique(df$label))
# [1] 5 6 3 7 4 2 1

We will use that rank (shown above) to reorder the levels in-order-observed, before matching our vector against them as follows:

# And now we get the desired result
match(df$label, levels(df$label)[ rank(unique(df$label)) ] )
# [1] 1 2 3 4 5 4 6 6 7 7


来源:https://stackoverflow.com/questions/23028406/how-to-reorder-arbitrary-integer-vector-to-be-in-increasing-order

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!