Python Pandas: remove entries based on the number of occurrences

后端 未结 4 1713
走了就别回头了
走了就别回头了 2020-12-05 07:36

I\'m trying to remove entries from a data frame which occur less than 100 times. The data frame data looks like this:

pid   tag
1     23    
1          


        
4条回答
  •  被撕碎了的回忆
    2020-12-05 08:10

    New in 0.12, groupby objects have a filter method, allowing you to do these types of operations:

    In [11]: g = data.groupby('tag')
    
    In [12]: g.filter(lambda x: len(x) > 1)  # pandas 0.13.1
    Out[12]:
       pid  tag
    1    1   45
    2    1   62
    4    2   45
    7    3   62
    

    The function (the first argument of filter) is applied to each group (subframe), and the results include elements of the original DataFrame belonging to groups which evaluated to True.

    Note: in 0.12 the ordering is different than in the original DataFrame, this was fixed in 0.13+:

    In [21]: g.filter(lambda x: len(x) > 1)  # pandas 0.12
    Out[21]: 
       pid  tag
    1    1   45
    4    2   45
    2    1   62
    7    3   62
    

提交回复
热议问题