What does the dollar sign mean in robots.txt

半城伤御伤魂 提交于 2019-12-01 18:27:20

问题


I am curious about a website and want to do some web crawling at the /s path. Its robots.txt:

User-Agent: *
Allow: /$
Allow: /debug/
Allow: /qa/
Allow: /wiki/
Allow: /cgi-bin/loginpage
Disallow: /

My questions are:

  • What does the dollar-sign mean in this case?

  • And is it appropriate to crawl the URL /s? with respect to the robots.txt file?


回答1:


If you follow the original robots.txt specification, $ has no special meaning, and there is no Allow field defined. A conforming bot would have to ignore fields it does not know, therefore such a bot would actually see this record:

User-Agent: *
Disallow: /

However, the original robots.txt specification has been extended by various parties. But as the authors of the robots.txt in question did not target a specific bot, we don’t know which "extension" they had in mind.

Typically (but not necessarily, as it’s not formally specified), Allow overwrites rules specified in Disallow, and $ represents the end of the URL path.

Following this interpretation (it’s, for example, used by Google), Allow: /$ would mean: You may crawl /, but you may not crawl /a, /b and so on.

So crawling of URLs whose path starts with /s would not be allowed (neither according to the original spec, thanks to Disallow: /, nor according to Google’s extension).



来源:https://stackoverflow.com/questions/29455403/what-does-the-dollar-sign-mean-in-robots-txt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!