问题
I have a dataframe df_in like so:
import pandas as pd
dic_in = {'A':['aa','aa','bb','cc','cc','cc','cc','dd','dd','dd','ee'],
'B':['200','200','200','400','400','500','700','700','900','900','200'],
'C':['da','cs','fr','fs','se','at','yu','j5','31','ds','sz']}
df_in = pd.DataFrame(dic_in)
I would like to investigate the 2 columns A and B in the following way.
I 2 consecutive rows[['A','B']] are equal then they are assigned a new value (according to a specific rule which i am about to describe).
I will give an example to be more clear: If the first row[['A','B']] is equal to the following one, then I set 1; if the second one is equal to the third one then I will set 1. Every time two consecutive rows are different, then I increase the value to set by 1.
The result should look like this:
A B C value
0 aa 200 da 1
1 aa 200 cs 1
2 bb 200 fr 2
3 cc 400 fs 3
4 cc 400 se 3
5 cc 500 at 4
6 cc 700 yu 5
7 dd 700 j5 6
8 dd 900 31 7
9 dd 900 ds 7
10 ee 200 sz 8
Can you suggest me a smart one to achieve this goal?
回答1:
Use shift and any to compare consecutive rows, using True to indicate where the value should change. Then take the cumulative sum with cumsum to get the increasing value:
df_in['value'] = (df_in[['A', 'B']] != df_in[['A', 'B']].shift()).any(axis=1)
df_in['value'] = df_in['value'].cumsum()
Alternatively, condensing it to one line:
df_in['value'] = (df_in[['A', 'B']] != df_in[['A', 'B']].shift()).any(axis=1).cumsum()
The resulting output:
A B C value
0 aa 200 da 1
1 aa 200 cs 1
2 bb 200 fr 2
3 cc 400 fs 3
4 cc 400 se 3
5 cc 500 at 4
6 cc 700 yu 5
7 dd 700 j5 6
8 dd 900 31 7
9 dd 900 ds 7
10 ee 200 sz 8
来源:https://stackoverflow.com/questions/40979760/compare-2-consecutive-rows-and-assign-increasing-value-if-different-using-panda