groupby counter of rows

让人想犯罪 __ 提交于 2019-12-20 06:51:03

问题


I am trying to create a new variable which counts how many times had been seen the same id over time.

Need to pass from this dataframe

   id     clae6  year    quarter        
     1  475230.0  2007          1                   
     1  475230.0  2007          2                     
     1  475230.0  2007          3                     
     1  475230.0  2007          4                    
     1  475230.0  2008          1
     1  475230.0  2008          2         
     2  475230.0  2007          1                    
     2  475230.0  2007          2                    
     2  475230.0  2007          3                  
     2  475230.0  2007          4                   
     2  475230.0  2008          1     
     3  475230.0  2010          1     
     3  475230.0  2010          2     
     3  475230.0  2010          3     
     3  475230.0  2010          4     

to this

   id     clae6  year    quarter     new_variable      
     1  475230.0  2007          1         1   
     1  475230.0  2007          2         2            
     1  475230.0  2007          3         3            
     1  475230.0  2007          4         4           
     1  475230.0  2008          1         5
     1  475230.0  2008          2         6
     2  475230.0  2007          1         1           
     2  475230.0  2007          2         2           
     2  475230.0  2007          3         3         
     2  475230.0  2007          4         4          
     2  475230.0  2008          1         5
     3  475230.0  2010          1         1
     3  475230.0  2010          2         2
     3  475230.0  2010          3         3
     3  475230.0  2010          4         4 

I am using the following code, but maybe there is one more easier (i am operating over a lot of records, so i am looking for a faster code):

df['control'] = 1
df['new_variable'] = df.groupby(['id'])['control'].cumsum()

回答1:


By using cumcount

df.groupby('id').cumcount().add(1)
Out[1574]: 
0     1
1     2
2     3
3     4
4     5
5     6
6     1
7     2
8     3
9     4
10    5
11    1
12    2
13    3
14    4
dtype: int64



回答2:


You can use rank

df['new'] = df.groupby('id').rank(method = 'first').astype(int)

    id  clae6   year    quarter new
0   1   475230.0    2007    1   1
1   1   475230.0    2007    2   2
2   1   475230.0    2007    3   3
3   1   475230.0    2007    4   4
4   1   475230.0    2008    1   5
5   1   475230.0    2008    2   6
6   2   475230.0    2007    1   1
7   2   475230.0    2007    2   2
8   2   475230.0    2007    3   3
9   2   475230.0    2007    4   4
10  2   475230.0    2008    1   5
11  3   475230.0    2010    1   1
12  3   475230.0    2010    2   2
13  3   475230.0    2010    3   3
14  3   475230.0    2010    4   4


来源:https://stackoverflow.com/questions/47601848/groupby-counter-of-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!