问题
I'm trying to simultaneously call 3 URLs and log any errors. Here is my sample code:
urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];
try:
results = pool.map(urllib2.urlopen, urls);
except URLError:
urllib2.urlopen("https://example.com/log_error/?url="+URLError.url);
I just want to know which URLs (if any) erred by having them call that /log_error/
URL. But when I have the code like this, I'm getting an error saying URLError
is not defined.
I do have these imports at the top of my code:
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
Here is my whole error response (this is using AWS Lambda, for whatever it's worth)
{
"stackTrace": [
[
"/var/task/lambda_function.py",
27,
"lambda_handler",
"except Error as e:"
]
],
"errorType": "NameError",
"errorMessage": "global name 'URLError' is not defined"
}
How do I capture the erroring URLs so I know which they are?
UPDATE
I figured it out: the urllib.error
class which URLError
is a part of is just that: urllib
, not urllib2
.
The top of this documentation page explains that: https://docs.python.org/2/library/urllib2.html
And here is the more detailed HTTPError object that I ACTUALLY get: https://docs.python.org/2/library/urllib2.html#urllib2.HTTPError
The problem of erroring URL itself still exists though... currently I have no way to identify which URL is the one erroring.
UPDATE 2
Apparently str(e.url)
was all I needed. I did not find any documentation on this; it was solely a lucky guess on my part.
So this is the working code now:
urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"];
try:
results = pool.map(urllib2.urlopen, urls);
except Exception as e:
urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;
UPDATE 3
Thanks to @mfripp informing me about the dangers of pool.map I have revised this code once more to this:
def my_urlopen(url):
try:
return urllib2.urlopen(url)
except URLError:
urllib2.urlopen("https://example.com/log_error/?url="+url)
return None
def lambda_handler(event, context):
urls = [
"https://example.com/gives200.php",
"https://example.com/alsogives200.php",
"https://example.com/gives500.php"
];
results = pool.map(urllib2.urlopen, urls);
return urls;
回答1:
I'm not sure whether the exception object will give you details on the URL that failed. If not, you need to wrap each call to urllib2.urlopen(url)
with try
and catch
. You could do that like this:
urls = [
"https://example.com/gives200.php",
"https://example.com/alsogives200.php",
"https://example.com/gives500.php"
]
def my_urlopen(url):
try:
return urllib2.urlopen(url)
except URLError:
urllib2.urlopen("https://example.com/log_error/?url="+url)
return None
results = pool.map(my_urlopen, urls)
# At this point, any failed requests will have None as their value
回答2:
from multiprocessing import Process, Pool
import urllib2
# Asynchronous request
def async_reqest(url):
try:
request = urllib2.Request(url)
response = urllib2.urlopen(request)
print response.info()
except:
pass
pool = Pool()
pool.map(async_reqest, links)
回答3:
EDIT See UPDATE 3 above. mfripp's answer needed to be MERGED with this one to make it wholly complete.
I updated the original post to explain, but this is exactly the code I needed. I could not find any documentation which lead me to e.url
, it was simply a lucky guess on my end.
urls = [
"https://example.com/gives200.php",
"https://example.com/alsogives200.php",
"https://example.com/gives500.php"
];
try:
results = pool.map(urllib2.urlopen, urls);
except Exception as e:
urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;
来源:https://stackoverflow.com/questions/43643380/python-how-to-know-which-url-is-failing-using-urllib2-and-pool-map