Efficient method to calculate the rank vector of a list in Python

后端 未结 11 924
温柔的废话
温柔的废话 2020-12-02 20:06

I\'m looking for an efficient way to calculate the rank vector of a list in Python, similar to R\'s rank function. In a simple list with no ties between the ele

相关标签:
11条回答
  • 2020-12-02 20:56
    import numpy as np
    
    def rankVec(arg):
        p = np.unique(arg) #take unique value
        k = (-p).argsort().argsort() #sort based on arguments in ascending order
        dd = defaultdict(int)
        for i in xrange(np.shape(p)[0]):
            dd[p[i]] = k[i]
        return np.array([dd[x] for x in arg])
    

    timecomplexity is 46.2us

    0 讨论(0)
  • 2020-12-02 20:57

    This is one of the functions that I wrote to calculate rank.

    def calculate_rank(vector):
      a={}
      rank=1
      for num in sorted(vector):
        if num not in a:
          a[num]=rank
          rank=rank+1
      return[a[i] for i in vector]
    

    input:

    calculate_rank([1,3,4,8,7,5,4,6])
    

    output:

    [1, 2, 3, 7, 6, 4, 3, 5]
    
    0 讨论(0)
  • 2020-12-02 20:58

    There is a really nice module called Ranking http://pythonhosted.org/ranking/ with an easy to follow instruction page. To download, simply use easy_install ranking

    0 讨论(0)
  • 2020-12-02 20:59

    This doesn't give the exact result you specify, but perhaps it would be useful anyways. The following snippet gives the first index for each element, yielding a final rank vector of [0, 1, 2, 2, 2, 5, 6]

    def rank_index(vector):
        return [vector.index(x) for x in sorted(range(n), key=vector.__getitem__)]
    

    Your own testing would have to prove the efficiency of this.

    0 讨论(0)
  • 2020-12-02 21:08

    Using scipy, the function you are looking for is scipy.stats.rankdata :

    In [13]: import scipy.stats as ss
    In [19]: ss.rankdata([3, 1, 4, 15, 92])
    Out[19]: array([ 2.,  1.,  3.,  4.,  5.])
    
    In [20]: ss.rankdata([1, 2, 3, 3, 3, 4, 5])
    Out[20]: array([ 1.,  2.,  4.,  4.,  4.,  6.,  7.])
    

    The ranks start at 1, rather than 0 (as in your example), but then again, that's the way R's rank function works as well.

    Here is a pure-python equivalent of scipy's rankdata function:

    def rank_simple(vector):
        return sorted(range(len(vector)), key=vector.__getitem__)
    
    def rankdata(a):
        n = len(a)
        ivec=rank_simple(a)
        svec=[a[rank] for rank in ivec]
        sumranks = 0
        dupcount = 0
        newarray = [0]*n
        for i in xrange(n):
            sumranks += i
            dupcount += 1
            if i==n-1 or svec[i] != svec[i+1]:
                averank = sumranks / float(dupcount) + 1
                for j in xrange(i-dupcount+1,i+1):
                    newarray[ivec[j]] = averank
                sumranks = 0
                dupcount = 0
        return newarray
    
    print(rankdata([3, 1, 4, 15, 92]))
    # [2.0, 1.0, 3.0, 4.0, 5.0]
    print(rankdata([1, 2, 3, 3, 3, 4, 5]))
    # [1.0, 2.0, 4.0, 4.0, 4.0, 6.0, 7.0]
    
    0 讨论(0)
提交回复
热议问题