问题
I have a DataFrame
with columns 'Id'
which is unique, and 'A', 'B', 'C'
, etc...
There are different rows where all values 'A', 'B', 'C'
are the same. I'd like to give them a group name (a running index from 1).
For example:
df = pd.DataFrame({"A": [1, 1, 1, 2], "B": [3, 4, 4, 4], "C": [5, 5, 5, 5]})
df
Out[127]:
A B C
0 1 3 5
1 1 4 5
2 1 4 5
3 2 4 5
Will become
A B C grp
0 1 3 5 1
1 1 4 5 2
2 1 4 5 2
3 2 4 5 3
I know I can groupby ['A', 'B', 'C']
and get the keys, but than, I have to iterate over the keys and Dataframe in an un-optimized matter. I'm failing to do it in an optimized way
回答1:
Use GroupBy.ngroup:
df['grp'] = df.groupby(['A', 'B', 'C']).ngroup() + 1
print (df)
A B C grp
0 1 3 5 1
1 1 4 5 2
2 1 4 5 2
3 2 4 5 3
If columns are sorted:
df['grp'] = pd.factorize([tuple(x) for x in df.values])[0] + 1
来源:https://stackoverflow.com/questions/52034526/pandas-groupby-all-columns-and-mark-in-original-dataframe