Using Watir to check for bad links

后端 未结 4 1412
清歌不尽
清歌不尽 2021-01-13 02:32

I have an unordered list of links that I save off to the side, and I want to click each link and make sure it goes to a real page and doesnt 404, 500, etc.

The issue

4条回答
  •  旧巷少年郎
    2021-01-13 03:09

    All previous solutions are inefficient if you have a very huge number of links because for each one, it will establish a new HTTP connection with the server hosting the link.

    I have written a one-liner bash command that will use the curl command to fetch a list of links supplied from stdin and returns a list of status codes corresponding to each link. The key point here is that curl takes all bunch of links in the same invocation and it will reuse HTTP connections which will dramatically improve speed.

    However, curl will divide the list into chunks of 256, which is still by far more than 1! To make sure connections are reused, sort the links first (simply using the sort command).

    cat  | xargs curl --head --location -w '---HTTP_STATUS_CODE:%{http_code}\n\n' -s --retry 10 --globoff | grep HTTP_STATUS_CODE | cut -d: -f2 > 
    

    It is worth noting that the above command will follow HTTP redirects, retry 10 times for temporary errors (timeouts or 5xx) and of course will only fetch headers.

    Update: added --globoff so that curl won't expand any url if it contains {} or []

提交回复
热议问题