Using Watir to check for bad links

后端未结

关注

 4  1412

清歌不尽 2021-01-13 02:32

I have an unordered list of links that I save off to the side, and I want to click each link and make sure it goes to a real page and doesnt 404, 500, etc.

The issue

4条回答

旧巷少年郎 (楼主)

2021-01-13 03:09
All previous solutions are inefficient if you have a very huge number of links because for each one, it will establish a new HTTP connection with the server hosting the link.

I have written a one-liner bash command that will use the curl command to fetch a list of links supplied from stdin and returns a list of status codes corresponding to each link. The key point here is that curl takes all bunch of links in the same invocation and it will reuse HTTP connections which will dramatically improve speed.

However, curl will divide the list into chunks of 256, which is still by far more than 1! To make sure connections are reused, sort the links first (simply using the sort command).
```
cat  | xargs curl --head --location -w '---HTTP_STATUS_CODE:%{http_code}\n\n' -s --retry 10 --globoff | grep HTTP_STATUS_CODE | cut -d: -f2 > 
```
It is worth noting that the above command will follow HTTP redirects, retry 10 times for temporary errors (timeouts or 5xx) and of course will only fetch headers.

Update: added --globoff so that curl won't expand any url if it contains {} or []
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...