how to parallelize many (fuzzy) string comparisons using apply in Pandas?

前端 未结 3 2104
半阙折子戏
半阙折子戏 2020-12-02 10:18

I have the following problem

I have a dataframe master that contains sentences, such as

master
Out[8]: 
                  original
0         


        
3条回答
  •  心在旅途
    2020-12-02 10:57

    These answers are a little bit old. Some newer code:

    dmaster = dd.from_pandas(master, npartitions=4)
    dmaster['my_value'] = dmaster.original.apply(lambda x: helper(x, slave),meta=('x','f8'))
    dmaster.compute(scheduler='processes') 
    

    Personally I'd ditch that apply call to fuzzy_score in the helper function and just perform the operation there.

    You can alter the scheduler using these tips.

提交回复
热议问题