问题
Using the example here Drop all duplicate rows in Python Pandas
Lets say I don't want to drop the duplicates but change the value of the data in one of the columns in the subset.
So as per the example, if we use subset=['A','C'] to identify duplicates then I want to change row 1 column 'A' from foo to foo1.
I have a complicated way of doing this but there must be a more simple way that takes advantage of vectorization/built-in features.
Original df:
A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A
Desired df:
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
回答1:
You could use cumcount
and do something like
>>> c = df.groupby(["A","C"]).cumcount()
>>> c = c.replace(0, '').astype(str)
>>> df["A"] += c
>>> df
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
This works because the cumcount
gives us
>>> df.groupby(["A","C"]).cumcount()
0 0
1 1
2 0
3 0
dtype: int64
来源:https://stackoverflow.com/questions/37367524/pandas-change-a-specific-column-value-of-duplicate-rows