Can I use multiprocessing.Pool in a method of a class?

前端 未结 1 1913
北荒
北荒 2020-12-09 12:17

I am tring to use multiprocessing in my code for better performance.

However, I got an error as follows:

Traceback (most recent call las         


        
相关标签:
1条回答
  • 2020-12-09 13:09

    The issue is that you've got an unpicklable instance variable (namelist) in the Book instance. Because you're calling pool.map on an instance method, and you're running on Windows, the entire instance needs to be picklable in order for it to be passed to the child process. Book.namelist is a open file object (_io.BufferedReader), which can't be pickled. You can fix this a couple of ways. Based on the example code, it looks like you could just make format_char a top-level function:

    def format_char(char):
        char = char + "a"
        return char
    
    
    class Book(object):
        def __init__(self, arg):
            self.namelist = arg
    
        def format_book(self):
            self.tempread = ""
            charlist = [f.read() for f in self.namelist] #list of char
            with Pool() as pool:
                txtlist = pool.map(format_char,charlist)
            self.tempread = "".join(txtlist)
            return self.tempread
    

    However, if in reality, you need format_char to be an instance method, you can use __getstate__/__setstate__ to make Book picklable, by removing the namelist argument from the instance before pickling it:

    class Book(object):
        def __init__(self, arg):
            self.namelist = arg
    
        def __getstate__(self):
            """ This is called before pickling. """
            state = self.__dict__.copy()
            del state['namelist']
            return state
    
        def __setstate__(self, state):
            """ This is called while unpickling. """
            self.__dict__.update(state)
    
        def format_char(self,char):
            char = char + "a"
    
        def format_book(self):
            self.tempread = ""
            charlist = [f.read() for f in self.namelist] #list of char
            with Pool() as pool:
                txtlist = pool.map(self.format_char,charlist)
            self.tempread = "".join(txtlist)
            return self.tempread
    

    This would be ok as long as you don't need to access namelist in the child process.

    0 讨论(0)
提交回复
热议问题