问题
I have a web crawling python script that takes hours to complete, and is infeasible to run in its entirety on my local machine. Is there a convenient way to deploy this to a simple web server? The script basically downloads webpages into text files. How would this be best accomplished? Thanks!
回答1:
Since you said that performance is a problem and you are doing web-scraping, first thing to try is a Scrapy framework - it is a very fast and easy to use web-scraping framework. scrapyd tool would allow you to distribute the crawling - you can have multiple scrapyd
services running on different servers and split the load between each. See:
- Distributed crawls
- Running Scrapy on Amazon EC2
There is also a Scrapy Cloud service out there:
Scrapy Cloud bridges the highly efficient Scrapy development environment with a robust, fully-featured production environment to deploy and run your crawls. It's like a Heroku for Scrapy, although other technologies will be supported in the near future. It runs on top of the Scrapinghub platform, which means your project can scale on demand, as needed.
回答2:
As an alternative to the solutions already given, I would suggest Heroku. You can not only deploy easily a website, but also scripts for bots to run.
Basic account is free and is pretty flexible.
This blog entry, this one and this video contain practical examples of how to make it work.
回答3:
There are multiple places where you can do that. Just google for "python in the cloud", you will come up with a few, for example https://www.pythonanywhere.com/.
In addition, there are also several cloud IDEs that essentially give you a small VM for free where you can develop your code in a web-based IDE and also run it in the VM, one example is http://www.c9.io.
来源:https://stackoverflow.com/questions/26901882/what-is-the-easiest-way-to-run-python-scripts-in-a-cloud-server