SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

前端 未结 5 895
我寻月下人不归
我寻月下人不归 2020-11-27 03:49

I come from a sql background and I use the following data processing step frequently:

  1. Partition the table of data by one or more fields
  2. For each parti
5条回答
  •  温柔的废话
    2020-11-27 04:27

    You can do this by using groupby twice along with the rank method:

    In [11]: g = df.groupby('key1')
    

    Use the min method argument to give values which share the same data1 the same RN:

    In [12]: g['data1'].rank(method='min')
    Out[12]:
    0    1
    1    2
    2    2
    3    1
    4    4
    dtype: float64
    
    In [13]: df['RN'] = g['data1'].rank(method='min')
    

    And then groupby these results and add the rank with respect to data2:

    In [14]: g1 = df.groupby(['key1', 'RN'])
    
    In [15]: g1['data2'].rank(ascending=False) - 1
    Out[15]:
    0    0
    1    0
    2    1
    3    0
    4    0
    dtype: float64
    
    In [16]: df['RN'] += g1['data2'].rank(ascending=False) - 1
    
    In [17]: df
    Out[17]:
       data1  data2 key1  RN
    0      1      1    a   1
    1      2     10    a   2
    2      2      2    a   3
    3      3      3    b   1
    4      3     30    a   4
    

    It feels like there ought to be a native way to do this (there may well be!...).

提交回复
热议问题