Pandas: Selecting rows based on value counts of a particular column

后端 未结 2 811
故里飘歌
故里飘歌 2020-12-09 11:55

Whats the simplest way of selecting all rows from a panda dataframe, who\'s sym occurs exactly twice in the entire table? For example, in the table below, I would like to se

相关标签:
2条回答
  • 2020-12-09 12:09

    You can use map, which should be faster than using groupby and transform:

    df[df['sym'].map(df['sym'].value_counts()) == 2]
    

    e.g.

    %%timeit
    df[df['sym'].map(df['sym'].value_counts()) == 2]
    Out[1]:
    1.83 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    %%timeit
    df[df.groupby("sym")["sym"].transform('size') == 2]
    Out[2]:
    2.08 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    0 讨论(0)
  • 2020-12-09 12:36

    I think you can use groupby by column sym and filter values with length == 2:

    print df.groupby("sym").filter(lambda x: len(x) == 2)
          price sym
    1  0.400157   b
    2  0.978738   b
    7 -0.151357   e
    8 -0.103219   e
    

    Second solution use isin with boolean indexing:

    s = df.sym.value_counts()
    
    print s[s == 2].index
    Index([u'e', u'b'], dtype='object')
    
    print df[df.sym.isin(s[s == 2].index)]
          price sym
    1  0.400157   b
    2  0.978738   b
    7 -0.151357   e
    8 -0.103219   e
    

    And fastest solution with transform and boolean indexing:

    print (df[df.groupby("sym")["sym"].transform('size') == 2])
        price sym
    1 -1.2940   b
    2  1.8423   b
    7  0.6280   e
    8  0.5361   e
    
    0 讨论(0)
提交回复
热议问题