Get first and second highest values in pandas columns

后端 未结 5 1661
春和景丽
春和景丽 2020-12-15 05:34

I am using pandas to analyse some election results. I have a DF, Results, which has a row for each constituency and columns representing the votes for the various parties (o

相关标签:
5条回答
  • 2020-12-15 05:50

    You could just sort your results, such that the first rows will contain the max. Then you can simply use indexing to get the first n places.

    RawResults = Results.ix[:, 'Unnamed: 9': 'Zeb'].sort_values(by='votes', ascending=False)
    RawResults.iloc[0, :] # First place
    RawResults.iloc[1, :] # Second place
    RawResults.iloc[n, :] # nth place
    
    0 讨论(0)
  • 2020-12-15 05:52

    Here is an interesting approach. What if we replace the maximum value with the minimum value and calculate. Although it is a quick hack and, not recommended!

    first_highest_value_index = df.idxmax()
    second_highest_value_index = df.replace(df.max(),df(min)).idxmax()
    
    first_highest_value = df[first_highest_value_index]
    second_highest_value = df[second_highest_value_index]
    
    0 讨论(0)
  • 2020-12-15 05:57

    Here is a NumPy solution:

    In [120]: df
    Out[120]:
              a         b         c         d         e         f         g         h
    0  1.334444  0.322029  0.302296 -0.841236 -0.360488 -0.860188 -0.157942  1.522082
    1  2.056572  0.991643  0.160067 -0.066473  0.235132  0.533202  1.282371 -2.050731
    2  0.955586 -0.966734  0.055210 -0.993924 -0.553841  0.173793 -0.534548 -1.796006
    3  1.201001  1.067291 -0.562357 -0.794284 -0.554820 -0.011836  0.519928  0.514669
    4 -0.243972 -0.048144  0.498007  0.862016  1.284717 -0.886455 -0.757603  0.541992
    5  0.739435 -0.767399  1.574173  1.197063 -1.147961 -0.903858  0.011073 -1.404868
    6 -1.258282 -0.049719  0.400063  0.611456  0.443289 -1.110945  1.352029  0.215460
    7  0.029121 -0.771431 -0.285119 -0.018216  0.408425 -1.458476 -1.363583  0.155134
    8  1.427226 -1.005345  0.208665 -0.674917  0.287929 -1.259707  0.220420 -1.087245
    9  0.452589  0.214592 -1.875423  0.487496  2.411265  0.062324 -0.327891  0.256577
    
    In [121]: np.sort(df.values)[:,-2:]
    Out[121]:
    array([[ 1.33444404,  1.52208164],
           [ 1.28237078,  2.05657214],
           [ 0.17379254,  0.95558613],
           [ 1.06729107,  1.20100071],
           [ 0.86201603,  1.28471676],
           [ 1.19706331,  1.57417327],
           [ 0.61145573,  1.35202868],
           [ 0.15513379,  0.40842477],
           [ 0.28792928,  1.42722604],
           [ 0.48749578,  2.41126532]])
    

    or as a pandas Data Frame:

    In [122]: pd.DataFrame(np.sort(df.values)[:,-2:], columns=['2nd-largest','largest'])
    Out[122]:
       2nd-largest   largest
    0     1.334444  1.522082
    1     1.282371  2.056572
    2     0.173793  0.955586
    3     1.067291  1.201001
    4     0.862016  1.284717
    5     1.197063  1.574173
    6     0.611456  1.352029
    7     0.155134  0.408425
    8     0.287929  1.427226
    9     0.487496  2.411265
    

    or a faster solution from @Divakar:

    In [6]: df
    Out[6]:
              a         b         c         d         e         f         g         h
    0  0.649517 -0.223116  0.264734 -1.121666  0.151591 -1.335756 -0.155459 -2.500680
    1  0.172981  1.233523  0.220378  1.188080 -0.289469 -0.039150  1.476852  0.736908
    2 -1.904024  0.109314  0.045741 -0.341214 -0.332267 -1.363889  0.177705 -0.892018
    3 -2.606532 -0.483314  0.054624  0.979734  0.205173  0.350247 -1.088776  1.501327
    4  1.627655 -1.261631  0.589899 -0.660119  0.742390 -1.088103  0.228557  0.714746
    5  0.423972 -0.506975 -0.783718 -2.044002 -0.692734  0.980399  1.007460  0.161516
    6 -0.777123 -0.838311 -1.116104 -0.433797  0.599724 -0.884832 -0.086431 -0.738298
    7  1.131621  1.218199  0.645709  0.066216 -0.265023  0.606963 -0.194694  0.463576
    8  0.421164  0.626731 -0.547738  0.989820 -1.383061 -0.060413 -1.342769 -0.777907
    9 -1.152690  0.696714 -0.155727 -0.991975 -0.806530  1.454522  0.788688  0.409516
    
    In [7]: a = df.values
    
    In [8]: a[np.arange(len(df))[:,None],np.argpartition(-a,np.arange(2),axis=1)[:,:2]]
    Out[8]:
    array([[ 0.64951665,  0.26473378],
           [ 1.47685226,  1.23352348],
           [ 0.17770473,  0.10931398],
           [ 1.50132666,  0.97973383],
           [ 1.62765464,  0.74238959],
           [ 1.00745981,  0.98039898],
           [ 0.5997243 , -0.0864306 ],
           [ 1.21819904,  1.13162068],
           [ 0.98982033,  0.62673128],
           [ 1.45452173,  0.78868785]])
    
    0 讨论(0)
  • 2020-12-15 06:08

    To get the highest values of a column, you can use nlargest() :

    df['High'].nlargest(2)
    

    The above will give you the 2 highest values of column High.


    You can also use nsmallest() the same way to get the lowest values.

    0 讨论(0)
  • 2020-12-15 06:13

    Here is a solution using nlargest function:

    >>> df
        a   b   c
     0  4  20   2
     1  5  10   2
     2  3  40   5
     3  1  50  10
     4  2  30  15
    >>> def give_largest(col,n):
    ...     largest = col.nlargest(n).reset_index(drop = True)
    ...     data = [x for x in largest]
    ...     index = [f'{i}_largest' for i in range(1,len(largest)+1)]
    ...     return pd.Series(data,index=index)
    ...
    ...
    >>> def n_largest(df, axis, n):
    ...     '''
    ...     Function to return the n-largest value of each
    ...     column/row of the input DataFrame.
    ...     '''
    ...     return df.apply(give_largest, axis = axis, n = n)
    ...
    >>> n_largest(df,axis = 1, n = 2)
       1_largest  2_largest
    0         20          4
    1         10          5
    2         40          5
    3         50         10
    4         30         15
    >>> n_largest(df,axis = 0, n = 2)
                      a           b           c     
    1_largest         5          50           15
    2_largest         4          40           10
    
    0 讨论(0)
提交回复
热议问题