port web scraper, scrapy 0.24, to python 3. or scrap scrapy for something better

荒凉一梦 提交于 2020-01-01 19:20:13

问题


I'm trying to use scrapy to make a web scraper but I'm running into many problems since it uses Python2. is it possible to run the 2to3 command on all the files in the tarball simultaneously? Would that cause unforseen errors? Is there an alternative web scraper framework which is more up to date, more functional that might be recommended in stead?

I say that because there doesn't seem to be much recent activity on forms on the problems inherent with running version 0.24 of scrapy, i.e. the fact that it's written in python 2.

If scrappy is the best choice, and porting is a bad idea, what's the best way to run this on my python3 oriented machine? a command to run it only with python 2 or something i can change in a config file or whatnot.

UPDATE

If you have such problems what you need to do is:

simply run the setup.py script with python2, i.e.,

python2 setup.py install

and you're good to go, after that it'll work.

^as indicated by @alecxe


回答1:


The problem with porting Scrapy to Python 3 is that Scrapy is built-in on top of the twisted event-driven framework, which currently is not yet there.

There is no web-scraping framework as big and mature as Scrapy on Python 3. Though, pyspider looks promising, but it is a bit different, see:

  • Can Scrapy be replaced by pyspider?

Also, there are other libraries related to web-scraping and html-parsing that support Python 3:

  • beautifulsoup4
  • lxml
  • requests
  • MechanicalSoup (built on top of requests and BeautifulSoup)
  • selenium


来源:https://stackoverflow.com/questions/28390386/port-web-scraper-scrapy-0-24-to-python-3-or-scrap-scrapy-for-something-better

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!