问题
I am trying to create a new variable which counts how many times had been seen the same id over time.
Need to pass from this dataframe
id clae6 year quarter
1 475230.0 2007 1
1 475230.0 2007 2
1 475230.0 2007 3
1 475230.0 2007 4
1 475230.0 2008 1
1 475230.0 2008 2
2 475230.0 2007 1
2 475230.0 2007 2
2 475230.0 2007 3
2 475230.0 2007 4
2 475230.0 2008 1
3 475230.0 2010 1
3 475230.0 2010 2
3 475230.0 2010 3
3 475230.0 2010 4
to this
id clae6 year quarter new_variable
1 475230.0 2007 1 1
1 475230.0 2007 2 2
1 475230.0 2007 3 3
1 475230.0 2007 4 4
1 475230.0 2008 1 5
1 475230.0 2008 2 6
2 475230.0 2007 1 1
2 475230.0 2007 2 2
2 475230.0 2007 3 3
2 475230.0 2007 4 4
2 475230.0 2008 1 5
3 475230.0 2010 1 1
3 475230.0 2010 2 2
3 475230.0 2010 3 3
3 475230.0 2010 4 4
I am using the following code, but maybe there is one more easier (i am operating over a lot of records, so i am looking for a faster code):
df['control'] = 1
df['new_variable'] = df.groupby(['id'])['control'].cumsum()
回答1:
By using cumcount
df.groupby('id').cumcount().add(1)
Out[1574]:
0 1
1 2
2 3
3 4
4 5
5 6
6 1
7 2
8 3
9 4
10 5
11 1
12 2
13 3
14 4
dtype: int64
回答2:
You can use rank
df['new'] = df.groupby('id').rank(method = 'first').astype(int)
id clae6 year quarter new
0 1 475230.0 2007 1 1
1 1 475230.0 2007 2 2
2 1 475230.0 2007 3 3
3 1 475230.0 2007 4 4
4 1 475230.0 2008 1 5
5 1 475230.0 2008 2 6
6 2 475230.0 2007 1 1
7 2 475230.0 2007 2 2
8 2 475230.0 2007 3 3
9 2 475230.0 2007 4 4
10 2 475230.0 2008 1 5
11 3 475230.0 2010 1 1
12 3 475230.0 2010 2 2
13 3 475230.0 2010 3 3
14 3 475230.0 2010 4 4
来源:https://stackoverflow.com/questions/47601848/groupby-counter-of-rows