I\'m trying to remove entries from a data frame which occur less than 100 times.
The data frame data looks like this:
pid tag
1 23
1
New in 0.12, groupby objects have a filter method, allowing you to do these types of operations:
In [11]: g = data.groupby('tag')
In [12]: g.filter(lambda x: len(x) > 1) # pandas 0.13.1
Out[12]:
pid tag
1 1 45
2 1 62
4 2 45
7 3 62
The function (the first argument of filter) is applied to each group (subframe), and the results include elements of the original DataFrame belonging to groups which evaluated to True.
Note: in 0.12 the ordering is different than in the original DataFrame, this was fixed in 0.13+:
In [21]: g.filter(lambda x: len(x) > 1) # pandas 0.12
Out[21]:
pid tag
1 1 45
4 2 45
2 1 62
7 3 62