Pandas replace/dictionary slowness

前端 未结 2 1985
萌比男神i
萌比男神i 2020-12-14 04:13

Please help me understand why this \"replace from dictionary\" operation is slow in Python/Pandas:

# Series has 200 rows and 1 column
# Dictionary has 11269          


        
2条回答
  •  不知归路
    2020-12-14 04:40

    It looks like replace has a bit of overhead, and explicitly telling the Series what to do via map yields the best performance:

    series = series.map(lambda x: dictionary.get(x,x))
    

    If you're sure that all keys are in your dictionary you can get a very slight performance boost by not creating a lambda, and directly supplying the dictionary.get function. Any keys that are not present will return NaN via this method, so beware:

    series = series.map(dictionary.get)
    

    You can also supply just the dictionary itself, but this appears to introduce a bit of overhead:

    series = series.map(dictionary)
    

    Timings

    Some timing comparisons using your example data:

    %timeit series.map(dictionary.get)
    10000 loops, best of 3: 124 µs per loop
    
    %timeit series.map(lambda x: dictionary.get(x,x))
    10000 loops, best of 3: 150 µs per loop
    
    %timeit series.map(dictionary)
    100 loops, best of 3: 5.45 ms per loop
    
    %timeit series.replace(dictionary)
    1 loop, best of 3: 1.23 s per loop
    

提交回复
热议问题