Using Python 3.3 and Pandas 0.10
I have a DataFrame that is built from concatenating multiple CSV files. First, I filter out all values in the Name column that conta
You could first drop the duplicates:
In [11]: df = df.drop_duplicates()
In [12]: df
Out[12]:
Name ID
0 A 1
1 B 2
2 C 3
4 E 4
5 F 4
The groupby id and only consider those with one element:
In [13]: g = df.groupby('ID')
In [14]: size = (g.size() == 1)
In [15]: size
Out[15]:
ID
1 True
2 True
3 True
4 False
dtype: bool
In [16]: size[size].index
Out[16]: Int64Index([1, 2, 3], dtype=int64)
In [17]: df['ID'].isin(size[size].index)
Out[17]:
0 True
1 True
2 True
4 False
5 False
Name: ID, dtype: bool
And boolean index by this:
In [18]: df[df['ID'].isin(size[size].index)]
Out[18]:
Name ID
0 A 1
1 B 2
2 C 3