pandas change a specific column value of duplicate rows

问题

Using the example here Drop all duplicate rows in Python Pandas

Lets say I don't want to drop the duplicates but change the value of the data in one of the columns in the subset.

So as per the example, if we use subset=['A','C'] to identify duplicates then I want to change row 1 column 'A' from foo to foo1.

I have a complicated way of doing this but there must be a more simple way that takes advantage of vectorization/built-in features.

Original df:

    A   B   C
0   foo 0   A
1   foo 1   A
2   foo 1   B
3   bar 1   A

Desired df:

    A   B   C
0   foo 0   A
1   foo1 1   A
2   foo 1   B
3   bar 1   A

回答1:

You could use cumcount and do something like

>>> c = df.groupby(["A","C"]).cumcount()
>>> c = c.replace(0, '').astype(str)
>>> df["A"] += c
>>> df
      A  B  C
0   foo  0  A
1  foo1  1  A
2   foo  1  B
3   bar  1  A

This works because the cumcount gives us

>>> df.groupby(["A","C"]).cumcount()
0    0
1    1
2    0
3    0
dtype: int64

来源：https://stackoverflow.com/questions/37367524/pandas-change-a-specific-column-value-of-duplicate-rows

标签

python

pandas

duplicates

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!