Python: How to know which URL is failing using urllib2 and pool.map?

问题

I'm trying to simultaneously call 3 URLs and log any errors. Here is my sample code:

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];

try:
     results = pool.map(urllib2.urlopen, urls);
 except URLError:
     urllib2.urlopen("https://example.com/log_error/?url="+URLError.url);

I just want to know which URLs (if any) erred by having them call that /log_error/ URL. But when I have the code like this, I'm getting an error saying URLError is not defined.

I do have these imports at the top of my code:

import urllib2 
from multiprocessing.dummy import Pool as ThreadPool

Here is my whole error response (this is using AWS Lambda, for whatever it's worth)

{
  "stackTrace": [
    [
      "/var/task/lambda_function.py",
      27,
      "lambda_handler",
      "except Error as e:"
    ]
  ],
  "errorType": "NameError",
  "errorMessage": "global name 'URLError' is not defined"
}

How do I capture the erroring URLs so I know which they are?

UPDATE

I figured it out: the urllib.error class which URLError is a part of is just that: urllib, not urllib2.

The top of this documentation page explains that: https://docs.python.org/2/library/urllib2.html

And here is the more detailed HTTPError object that I ACTUALLY get: https://docs.python.org/2/library/urllib2.html#urllib2.HTTPError

The problem of erroring URL itself still exists though... currently I have no way to identify which URL is the one erroring.

UPDATE 2

Apparently str(e.url) was all I needed. I did not find any documentation on this; it was solely a lucky guess on my part.

So this is the working code now:

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];

try:
     results = pool.map(urllib2.urlopen, urls);
 except Exception as e:
     urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;

UPDATE 3

Thanks to @mfripp informing me about the dangers of pool.map I have revised this code once more to this:

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

def lambda_handler(event, context):

    urls = [
        "https://example.com/gives200.php", 
        "https://example.com/alsogives200.php", 
        "https://example.com/gives500.php"
    ];

    results = pool.map(urllib2.urlopen, urls);

    return urls;

回答1:

I'm not sure whether the exception object will give you details on the URL that failed. If not, you need to wrap each call to urllib2.urlopen(url) with try and catch. You could do that like this:

urls = [
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php"
]

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

results = pool.map(my_urlopen, urls)
# At this point, any failed requests will have None as their value

回答2:

from multiprocessing import Process, Pool
import urllib2

# Asynchronous request
def async_reqest(url):
    try:
        request = urllib2.Request(url)
        response = urllib2.urlopen(request)
        print response.info()
    except:
        pass

pool = Pool()
pool.map(async_reqest, links)

回答3:

EDIT See UPDATE 3 above. mfripp's answer needed to be MERGED with this one to make it wholly complete.

I updated the original post to explain, but this is exactly the code I needed. I could not find any documentation which lead me to e.url, it was simply a lucky guess on my end.

urls = [
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php"
];

try:
     results = pool.map(urllib2.urlopen, urls);
except Exception as e:
     urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;

来源：https://stackoverflow.com/questions/43643380/python-how-to-know-which-url-is-failing-using-urllib2-and-pool-map

标签

python

aws-lambda