问题
I am making a python application which uses the python Wikipedia package to retrieve the body text of 3 different Wikipedia pages. However, I am noticing very slow performance when retrieving the articles one at a time. Is there a method that I can use to retrieve the body text of 3 Wikipedia pages in parallel?
回答1:
If you want the 'raw' page you can use any python scraping library such as twisted/scrapy. But, if you are looking for the parsed wiki format you should use pywikibot/mwparserfromhell with multiprocess.
回答2:
if you want a general purpose multiprocessing library, you could use binge
(pip install binge
):
def worker(url):
(...)
return urlbody
urls = ['https://www....',
'https://www....',
...
'https://www....']
from binge import B
list_of_urlbodies = B(worker)(urls)
cf: binge documentation
来源:https://stackoverflow.com/questions/49832952/concurrent-python-wikipedia-package-requests