Pandas: Selecting rows based on value counts of a particular column

后端 未结 2 818
故里飘歌
故里飘歌 2020-12-09 11:55

Whats the simplest way of selecting all rows from a panda dataframe, who\'s sym occurs exactly twice in the entire table? For example, in the table below, I would like to se

2条回答
  •  庸人自扰
    2020-12-09 12:09

    You can use map, which should be faster than using groupby and transform:

    df[df['sym'].map(df['sym'].value_counts()) == 2]
    

    e.g.

    %%timeit
    df[df['sym'].map(df['sym'].value_counts()) == 2]
    Out[1]:
    1.83 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    %%timeit
    df[df.groupby("sym")["sym"].transform('size') == 2]
    Out[2]:
    2.08 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

提交回复
热议问题