Checking whether a link is dead or not using Python without downloading the webpage

不问归期 提交于 2019-11-30 04:50:35

问题


For those who know wget, it has a option --spider, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate a lot of unnecessary traffic.

As far as I understand from the urllib2.urlopen() documentation, it does not download the page but only the meta-information. Is this correct? Or is there some other way to do this in a nice manner?

Best,
Troels


回答1:


You should use the HEAD Request for this, it asks the webserver for the headers without the body. See How do you send a HEAD HTTP request in Python 2?




回答2:


Not sure how to do this in python but generally you could check 'Response Header' and check 'Status-Code' for code 200. at that point you could stop reading the page and continue with your next link that way you don't have to download the whole page just the 'Response Header' List of Status Codes



来源:https://stackoverflow.com/questions/3229607/checking-whether-a-link-is-dead-or-not-using-python-without-downloading-the-webp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!