Python: simple async download of url content?

后端 未结 10 1524
天命终不由人
天命终不由人 2020-12-15 11:23

I have a web.py server that responds to various user requests. One of these requests involves downloading and analyzing a series of web pages.

Is there a simple way

相关标签:
10条回答
  • 2020-12-15 12:07

    One option would be to post the work onto a queue of some sort (you could use something Enterprisey like ActiveMQ with pyactivemq or STOMP as a connector or you could use something lightweight like Kestrel which is written in Scala and speaks the same protocl as memcache so you can just use the python memcache client to talk to it).

    Once you have the queueing mechanism set up, you can create as many or as few worker tasks that are subscribed to the queue and do the actual downloading work as you want. You can even have them live on other machines so they don't interfere with the speed of serving yourwebsite at all. When the workers are done, they post the results back to the database or another queue where the webserver can pick them up.

    If you don't want to have to manage external worker processes then you could make the workers threads in the same python process that is running the webserver, but then obviously it will have greater potential to impact your web page serving performance.

    0 讨论(0)
  • 2020-12-15 12:12

    Nowadays there are excellent Python libs you might want to use - urllib3 (uses thread pools) and requests (uses thread pools through urllib3 or non blocking IO through gevent)

    0 讨论(0)
  • 2020-12-15 12:13

    Along the lines of MarkusQ's answer, MochiKit is a nice JavaScript library, with robust async methods inspired by Twisted.

    0 讨论(0)
  • 2020-12-15 12:23

    I'm not sure I'm understanding your question, so I'll give multiple partial answers to start with.

    • If your concern is that web.py is having to download data from somewhere and analyze the results before responding, and you fear the request may time out before the results are ready, you could use ajax to split the work up. Return immediately with a container page (to hold the results) and a bit of javascript to poll the sever for the results until the client has them all. Thus the client never waits for the server, though the user still has to wait for the results.
    • If your concern is tying up the server waiting for the client to get the results, I doubt if that will actually be a problem. Your networking layers should not require you to wait-on-write
    • If you are worrying about the server waiting while the client downloads static content from elsewhere, either ajax or clever use of redirects should solve your problem
    0 讨论(0)
提交回复
热议问题