Efficient method to calculate the rank vector of a list in Python

后端 未结 11 951
温柔的废话
温柔的废话 2020-12-02 20:06

I\'m looking for an efficient way to calculate the rank vector of a list in Python, similar to R\'s rank function. In a simple list with no ties between the ele

11条回答
  •  不知归路
    2020-12-02 21:08

    Using scipy, the function you are looking for is scipy.stats.rankdata :

    In [13]: import scipy.stats as ss
    In [19]: ss.rankdata([3, 1, 4, 15, 92])
    Out[19]: array([ 2.,  1.,  3.,  4.,  5.])
    
    In [20]: ss.rankdata([1, 2, 3, 3, 3, 4, 5])
    Out[20]: array([ 1.,  2.,  4.,  4.,  4.,  6.,  7.])
    

    The ranks start at 1, rather than 0 (as in your example), but then again, that's the way R's rank function works as well.

    Here is a pure-python equivalent of scipy's rankdata function:

    def rank_simple(vector):
        return sorted(range(len(vector)), key=vector.__getitem__)
    
    def rankdata(a):
        n = len(a)
        ivec=rank_simple(a)
        svec=[a[rank] for rank in ivec]
        sumranks = 0
        dupcount = 0
        newarray = [0]*n
        for i in xrange(n):
            sumranks += i
            dupcount += 1
            if i==n-1 or svec[i] != svec[i+1]:
                averank = sumranks / float(dupcount) + 1
                for j in xrange(i-dupcount+1,i+1):
                    newarray[ivec[j]] = averank
                sumranks = 0
                dupcount = 0
        return newarray
    
    print(rankdata([3, 1, 4, 15, 92]))
    # [2.0, 1.0, 3.0, 4.0, 5.0]
    print(rankdata([1, 2, 3, 3, 3, 4, 5]))
    # [1.0, 2.0, 4.0, 4.0, 4.0, 6.0, 7.0]
    

提交回复
热议问题