Downloading pdf files using mechanize and urllib
问题 I am new to Python, and my current task is to write a web crawler that looks for PDF files in certain webpages and downloads them. Here's my current approach (just for 1 sample url): import mechanize import urllib import sys mech = mechanize.Browser() mech.set_handle_robots(False) url = "http://www.xyz.com" try: mech.open(url, timeout = 30.0) except HTTPError, e: sys.exit("%d: %s" % (e.code, e.msg)) links = mech.links() for l in links: #Some are relative links path = str(l.base_url[:-1])+str