I\'m looking for an efficient way to calculate the rank vector of a list in Python, similar to R\'s rank
function. In a simple list with no ties between the ele
import numpy as np
def rankVec(arg):
p = np.unique(arg) #take unique value
k = (-p).argsort().argsort() #sort based on arguments in ascending order
dd = defaultdict(int)
for i in xrange(np.shape(p)[0]):
dd[p[i]] = k[i]
return np.array([dd[x] for x in arg])
timecomplexity is 46.2us
This is one of the functions that I wrote to calculate rank.
def calculate_rank(vector):
a={}
rank=1
for num in sorted(vector):
if num not in a:
a[num]=rank
rank=rank+1
return[a[i] for i in vector]
input:
calculate_rank([1,3,4,8,7,5,4,6])
output:
[1, 2, 3, 7, 6, 4, 3, 5]
There is a really nice module called Ranking http://pythonhosted.org/ranking/ with an easy to follow instruction page. To download, simply use easy_install ranking
This doesn't give the exact result you specify, but perhaps it would be useful anyways. The following snippet gives the first index for each element, yielding a final rank vector of [0, 1, 2, 2, 2, 5, 6]
def rank_index(vector):
return [vector.index(x) for x in sorted(range(n), key=vector.__getitem__)]
Your own testing would have to prove the efficiency of this.
Using scipy, the function you are looking for is scipy.stats.rankdata :
In [13]: import scipy.stats as ss
In [19]: ss.rankdata([3, 1, 4, 15, 92])
Out[19]: array([ 2., 1., 3., 4., 5.])
In [20]: ss.rankdata([1, 2, 3, 3, 3, 4, 5])
Out[20]: array([ 1., 2., 4., 4., 4., 6., 7.])
The ranks start at 1, rather than 0 (as in your example), but then again, that's the way R
's rank
function works as well.
Here is a pure-python equivalent of scipy
's rankdata function:
def rank_simple(vector):
return sorted(range(len(vector)), key=vector.__getitem__)
def rankdata(a):
n = len(a)
ivec=rank_simple(a)
svec=[a[rank] for rank in ivec]
sumranks = 0
dupcount = 0
newarray = [0]*n
for i in xrange(n):
sumranks += i
dupcount += 1
if i==n-1 or svec[i] != svec[i+1]:
averank = sumranks / float(dupcount) + 1
for j in xrange(i-dupcount+1,i+1):
newarray[ivec[j]] = averank
sumranks = 0
dupcount = 0
return newarray
print(rankdata([3, 1, 4, 15, 92]))
# [2.0, 1.0, 3.0, 4.0, 5.0]
print(rankdata([1, 2, 3, 3, 3, 4, 5]))
# [1.0, 2.0, 4.0, 4.0, 4.0, 6.0, 7.0]