Typical politeness factor for a web crawler?
What is a typical politeness factor for a web crawler? Apart from always obeying robot.txt Both the "Disallow:" and non standard "Crawl-delay:" But if a site does not specify an explicit crawl-delay what should the default value be set at? The algorithm we use is: // If we are blocked by robots.txt // Make sure it is obeyed. // Our bots user-agent string contains a link to a html page explaining this. // Also an email address to be added to so that we never even consider their domain in the future // If we receive more that 5 consecutive responses with HTTP response code of 500+ (or timeouts)