numerical sort a column containing numbers and strings (pandas/python)

后端 未结 1 1008
抹茶落季
抹茶落季 2020-12-17 03:22

I have to sort a data frame on column 1 and 2; column 1 contains numbers and text, which should first be numerically sorted. In excel this is the standard way to sort, but n

相关标签:
1条回答
  • 2020-12-17 03:45

    Do you mean column 0 and 1?

    >>> df.sort([0, 1])
         0       1    2   3
    2    1  865545   20  20
    3    1  865584  297   0
    7    1  865665  296   0
    9    1  865700  297   0
    5    2  865628  292   5
    6   10  865662  297   0
    10  10  866429  297   0
    8   11  865694  293   1
    11  11  866438  297   0
    4   22  865625  297   0 
    0    Z  762320  296   1
    1    Z  861349  297   0
    

    [update]

    This happens if your data is not numeric (all elements are strings).

    >>> df.values
    array([['Z', '762320', '296', '1'],
           ['Z', '861349', '297', '0'],
           ['1', '865545', '20', '20'],
           ['1', '865584', '297', '0'],
           ['22', '865625', '297', '0'],
           ['2', '865628', '292', '5'],
           ['10', '865662', '297', '0'],
           ['1', '865665', '296', '0'],
           ['11', '865694', '293', '1'],
           ['1', '865700', '297', '0'],
           ['10', '866429', '297', '0'],
           ['11', '866438', '297', '0']], dtype=object)
    

    String ordering is the expected result:

    >>> df.sort([0, 1])    
         0       1    2   3
    2    1  865545   20  20
    3    1  865584  297   0
    7    1  865665  296   0
    9    1  865700  297   0
    6   10  865662  297   0
    10  10  866429  297   0
    8   11  865694  293   1
    11  11  866438  297   0
    5    2  865628  292   5
    4   22  865625  297   0
    0    Z  762320  296   1
    1    Z  861349  297   0
    

    Try to convert the values first:

    >>> def convert(v):
    ...:    try:
    ...:        return int(v)    
    ...:    except ValueError:
    ...:        return v
    
    >>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
          .sort([0, 1])
    
         0       1    2   3
    2    1  865545   20  20
    3    1  865584  297   0
    7    1  865665  296   0
    9    1  865700  297   0
    5    2  865628  292   5
    6   10  865662  297   0
    10  10  866429  297   0
    8   11  865694  293   1
    11  11  866438  297   0
    4   22  865625  297   0
    0    Z  762320  296   1
    1    Z  861349  297   0
    

    What is the difference? The elements are numeric now:

    >>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
          .sort([0, 1]).values
    
    array([[1.0, 865545.0, 20.0, 20.0],
          [1.0, 865584.0, 297.0, 0.0],
          [1.0, 865665.0, 296.0, 0.0],
          [1.0, 865700.0, 297.0, 0.0],
          [2.0, 865628.0, 292.0, 5.0],
          [10.0, 865662.0, 297.0, 0.0],
          [10.0, 866429.0, 297.0, 0.0],
          [11.0, 865694.0, 293.0, 1.0],
          [11.0, 866438.0, 297.0, 0.0],
          [22.0, 865625.0, 297.0, 0.0],
          ['Z', 762320.0, 296.0, 1.0],
          ['Z', 861349.0, 297.0, 0.0]], dtype=object)
    
    0 讨论(0)
提交回复
热议问题