Pandas replace/dictionary slowness

前端 未结 2 1980
萌比男神i
萌比男神i 2020-12-14 04:13

Please help me understand why this \"replace from dictionary\" operation is slow in Python/Pandas:

# Series has 200 rows and 1 column
# Dictionary has 11269          


        
2条回答
  •  爱一瞬间的悲伤
    2020-12-14 04:45

    .replacecan do incomplete substring matches, while .map requires complete values to be supplied in the dictionary (or it returns NaNs). The fast but generic solution (that can handle substring) should first use .replace on a dict of all possible values (obtained e.g. with .value_counts().index) and then go over all rows of the Series with this dict and .map. This combo can handle for instance special national characters replacements (full substrings) on 1m-row columns in a quarter of a second, where .replace alone would take 15.

提交回复
热议问题