How to speed up pandas row filtering by string matching?

前端 未结 3 1324
-上瘾入骨i
-上瘾入骨i 2021-01-31 22:47

I often need to filter pandas dataframe df by df[df[\'col_name\']==\'string_value\'], and I want to speed up the row selction operation, is there a qui

3条回答
  •  情书的邮戳
    2021-01-31 23:14

    Depending on what you want to do with the selection afterwards, and if you have to make multiple selections of this kind, the groupby functionality can also make things faster (at least with the example).

    Even if you only have to select the rows for one string_value, it is a little bit faster (but not much):

    In [11]: %timeit df[df['STK_ID']=='A0003']
    1 loops, best of 3: 626 ms per loop
    
    In [12]: %timeit df.groupby("STK_ID").get_group("A0003")
    1 loops, best of 3: 459 ms per loop
    

    But subsequent calls to the GroupBy object will be very fast (eg to select the rows of other sting_values):

    In [25]: grouped = df.groupby("STK_ID")
    
    In [26]: %timeit grouped.get_group("A0003")
    1 loops, best of 3: 333 us per loop
    

提交回复
热议问题