What are the key considerations when creating a web crawler?
问题 I just started thinking about creating/customizing a web crawler today, and know very little about web crawler/robot etiquette. A majority of the writings on etiquette I've found seem old and awkward, so I'd like to get some current (and practical) insights from the web developer community. I want to use a crawler to walk over "the web" for a super simple purpose - "does the markup of site XYZ meet condition ABC?". This raises a lot of questions for me, but I think the two main questions I