Python: simple async download of url content?

后端 未结 10 1523
天命终不由人
天命终不由人 2020-12-15 11:23

I have a web.py server that responds to various user requests. One of these requests involves downloading and analyzing a series of web pages.

Is there a simple way

相关标签:
10条回答
  • 2020-12-15 11:57

    Actually you can integrate twisted with web.py. I'm not really sure how as I've only done it with django (used twisted with it).

    0 讨论(0)
  • 2020-12-15 11:58

    I'd just build a service in twisted that did that concurrent fetch and analysis and access that from web.py as a simple http request.

    0 讨论(0)
  • 2020-12-15 12:00

    You might be able to use urllib to download the files and the Queue module to manage a number of worker threads. e.g:

    import urllib
    from threading import Thread
    from Queue import Queue
    
    NUM_WORKERS = 20
    
    class Dnld:
        def __init__(self):
            self.Q = Queue()
            for i in xrange(NUM_WORKERS):
                t = Thread(target=self.worker)
                t.setDaemon(True)
                t.start()
    
        def worker(self):
            while 1:
                url, Q = self.Q.get()
                try:
                    f = urllib.urlopen(url)
                    Q.put(('ok', url, f.read()))
                    f.close()
                except Exception, e:
                    Q.put(('error', url, e))
                    try: f.close() # clean up
                    except: pass
    
        def download_urls(self, L):
            Q = Queue() # Create a second queue so the worker 
                        # threads can send the data back again
            for url in L:
                # Add the URLs in `L` to be downloaded asynchronously
                self.Q.put((url, Q))
    
            rtn = []
            for i in xrange(len(L)):
                # Get the data as it arrives, raising 
                # any exceptions if they occur
                status, url, data = Q.get()
                if status == 'ok':
                    rtn.append((url, data))
                else:
                    raise data
            return rtn
    
    inst = Dnld()
    for url, data in inst.download_urls(['http://www.google.com']*2):
        print url, data
    
    0 讨论(0)
  • 2020-12-15 12:00

    I don't know if this will exactly work, but it looks like it might: EvServer: Python Asynchronous WSGI Server has a web.py interface and can do comet style push to the browser client.

    If that isn't right, maybe you can use the Concurrence HTTP client for async download of the pages and figure out how to serve them to browser via ajax or comet.

    0 讨论(0)
  • 2020-12-15 12:02

    Use the async http client which uses asynchat and asyncore. http://sourceforge.net/projects/asynchttp/files/asynchttp-production/asynchttp.py-1.0/asynchttp.py/download

    0 讨论(0)
  • 2020-12-15 12:05

    Here is an interesting piece of code. I didn't use it myself, but it looks nice ;)

    https://github.com/facebook/tornado/blob/master/tornado/httpclient.py

    Low level AsyncHTTPClient:

    "An non-blocking HTTP client backed with pycurl. Example usage:"

    import ioloop
    
    def handle_request(response):
        if response.error:
            print "Error:", response.error
        else:
            print response.body
        ioloop.IOLoop.instance().stop()
    
    http_client = httpclient.AsyncHTTPClient()
    http_client.fetch("http://www.google.com/", handle_request)
    ioloop.IOLoop.instance().start()
    

    " fetch() can take a string URL or an HTTPRequest instance, which offers more options, like executing POST/PUT/DELETE requests.

    The keyword argument max_clients to the AsyncHTTPClient constructor determines the maximum number of simultaneous fetch() operations that can execute in parallel on each IOLoop. "

    There is also new implementation in progress: https://github.com/facebook/tornado/blob/master/tornado/simple_httpclient.py "Non-blocking HTTP client with no external dependencies. ... This class is still in development and not yet recommended for production use."

    0 讨论(0)
提交回复
热议问题