Can't pickle local object while trying multiprocessing

问题

I´m trying to do a multiprocessing scrape of a website, where I get a list of all the nodes I want to get information from, and then generate a Pool so instead of getting data one by one it does in parallel. My code is the following:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import multiprocessing

def ResPartido(node):

   ft=node.find_element_by_css_selector('.status').text
   if ft.strip()!='FT': return
   hora=node.find_element_by_css_selector('.time').text
   names=list()
   for nam in node.find_elements_by_xpath(
            './/td[contains(@style,"text-align")]/a[contains(@id,"team")]'):
     name=nam.text
     if '(N)' in name:
        name=name.split('(N)')[0]
     names.append(name)
   score=node.find_element_by_css_selector('.red')

   return [hora,name,score.text]

if __name__ == "__main__":

   browser=webdriver.Chrome()
   SOME CODE
   nodes=browser.find_elements_by_xpath(
        '//tr[contains(@align,"center")]/following-sibling::tr[.//div[contains(@class,"toolimg")]]')
   p = multiprocessing.Pool()

   p.map(ResPartido,nodes)   <---Here is the error
   .......

   >>AttributeError: Can't pickle local object '_createenviron.<locals>.encodekey'

Image of my python terminal with the error

Checking the documentation, it says lists are pickable objects, and so are functions declared before the main one, so I don´t understand what am I doing wrong when using MultiProcessing.

回答1:

From what I´ve been reading, the problem is that nodes is a list of webdriver objects, which are not serializable. Given this, the only possible approach I can come with is the following.

1- Instead of getting the whole tag as an element of nodes list, get only what makes it unique from the other . In my example, each row has a serial number identifier

nodes=browser.find_elements_by_xpath(
    '//tr[contains(@align,"center")]/following-sibling::tr[.//div[contains(@class,"toolimg")]]/@id').get_attribute()

nodes=['1232489','1242356',......]

2- Pass it, along with the browser to the map function

pr=partial(ResPartido,b=browser)
p.map(pr,nodes)

3- In the ResPartido function find that unique row with the string that identifies its @id

browser.find_elements_by_xpath('//tr[contains(@id,%s)]' %s node)

With that bypass, which I haven´t tested yet, I think I could get what I intended without problems with pickleable objects

来源：https://stackoverflow.com/questions/47275036/cant-pickle-local-object-while-trying-multiprocessing

标签

python

multiprocessing

pickle