PySpark broadcast variables from local functions

后端 未结 1 996
Happy的楠姐
Happy的楠姐 2021-01-02 01:19

I\'m attempting to create broadcast variables from within Python methods (trying to abstract some utility methods I\'m creating that rely on distributed operations). However

1条回答
  •  青春惊慌失措
    2021-01-02 02:10

    I am not sure I completely understood the question but, if you need the V object inside the worker function you then you definitely should pass it as a parameter, otherwise the method is not really self-contained:

    def worker(V, element):
        element *= V.value
    

    Now in order to use it in map functions you need to use a partial, so that map only sees a 1 parameter function:

    from functools import partial
    
    def SomeMethod(sc):
        someValue = rand()
        V = sc.broadcast(someValue)
        A = sc.parallelize().map(partial(worker, V=V))
    

    0 讨论(0)
提交回复
热议问题