问题
I´m trying to do a multiprocessing scrape of a website, where I get a list of all the nodes I want to get information from, and then generate a Pool so instead of getting data one by one it does in parallel. My code is the following:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import multiprocessing
def ResPartido(node):
ft=node.find_element_by_css_selector('.status').text
if ft.strip()!='FT': return
hora=node.find_element_by_css_selector('.time').text
names=list()
for nam in node.find_elements_by_xpath(
'.//td[contains(@style,"text-align")]/a[contains(@id,"team")]'):
name=nam.text
if '(N)' in name:
name=name.split('(N)')[0]
names.append(name)
score=node.find_element_by_css_selector('.red')
return [hora,name,score.text]
if __name__ == "__main__":
browser=webdriver.Chrome()
SOME CODE
nodes=browser.find_elements_by_xpath(
'//tr[contains(@align,"center")]/following-sibling::tr[.//div[contains(@class,"toolimg")]]')
p = multiprocessing.Pool()
p.map(ResPartido,nodes) <---Here is the error
.......
>>AttributeError: Can't pickle local object '_createenviron.<locals>.encodekey'
Image of my python terminal with the error
Checking the documentation, it says lists are pickable objects, and so are functions declared before the main one, so I don´t understand what am I doing wrong when using MultiProcessing.
回答1:
From what I´ve been reading, the problem is that nodes is a list of webdriver objects, which are not serializable. Given this, the only possible approach I can come with is the following.
1- Instead of getting the whole tag as an element of nodes list, get only what makes it unique from the other . In my example, each row has a serial number identifier
nodes=browser.find_elements_by_xpath(
'//tr[contains(@align,"center")]/following-sibling::tr[.//div[contains(@class,"toolimg")]]/@id').get_attribute()
nodes=['1232489','1242356',......]
2- Pass it, along with the browser to the map function
pr=partial(ResPartido,b=browser)
p.map(pr,nodes)
3- In the ResPartido function find that unique row with the string that identifies its @id
browser.find_elements_by_xpath('//tr[contains(@id,%s)]' %s node)
With that bypass, which I haven´t tested yet, I think I could get what I intended without problems with pickleable objects
来源:https://stackoverflow.com/questions/47275036/cant-pickle-local-object-while-trying-multiprocessing