I want to drop duplicates and keep the first value. The duplicates that want to be dropped is A = \'df\' .Here\'s my data
A B C D E
qw 1 3 1 1
er
Using cumcount()
import pandas as pd
import numpy as np
df['cum'] = df.groupby(['A']).cumcount()
df['cum2'] = np.append([0],np.diff(df.cum))
df.query("~((A == 'df') & (cum2 == 1))").drop(['cum','cum2'],axis=1)
df looks like:
In [6]: df
Out[6]:
A B C D E cum
0 qw 1 3 1 1 0
1 er 2 4 2 6 0
2 ew 4 8 44 4 0
3 df 34 34 34 34 0
4 df 2 5 2 2 1
5 df 3 3 7 3 2
6 df 4 4 7 4 3
7 we 2 5 5 2 0
8 we 4 4 4 4 1
9 df 34 9 34 34 4
10 df 3 3 9 3 5
11 we 4 7 4 4 2
12 qw 2 2 7 2 1
np.diff
In [7]: df['cum2'] = np.append([0],np.diff(df.cum))
In [8]: df
Out[8]:
A B C D E cum cum2
0 qw 1 3 1 1 0 0
1 er 2 4 2 6 0 0
2 ew 4 8 44 4 0 0
3 df 34 34 34 34 0 0
4 df 2 5 2 2 1 1
5 df 3 3 7 3 2 1
6 df 4 4 7 4 3 1
7 we 2 5 5 2 0 -3
8 we 4 4 4 4 1 1
9 df 34 9 34 34 4 3
10 df 3 3 9 3 5 1
11 we 4 7 4 4 2 -3
12 qw 2 2 7 2 1 -1
output
In [12]: df.query("~((A == 'df') & (cum2 == 1))").drop(['cum','cum2'],axis=1)
Out[12]:
A B C D E
0 qw 1 3 1 1
1 er 2 4 2 6
2 ew 4 8 44 4
3 df 34 34 34 34
7 we 2 5 5 2
8 we 4 4 4 4
9 df 34 9 34 34
11 we 4 7 4 4
12 qw 2 2 7 2
reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.cumcount.html