robots.txt URL format

落爺英雄遲暮 提交于 2019-12-06 01:16:51

问题


According to this page

globbing and regular expression are not supported in either the User-agent or Disallow lines

However, I noticed that the stackoverflow robots.txt includes characters like * and ? in the URLs. Are these supported or not?

Also, does it make any difference whether a URL includes a trailing slash, or are these two equivalent?

Disallow: /privacy
Disallow: /privacy/

回答1:


Your second question, the two are not equivalent. /privacy will block anything that starts with /privacy, including something like /privacy_xyzzy. /privacy/, on the other hand, would not block that.

The original robots.txt did not support globbing or wildcards. However, many robots do. Google, Microsoft, and Yahoo agreed on a standard a few years back. See http://googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html for details.

Most major robots that I know of support that "standard."



来源:https://stackoverflow.com/questions/14538859/robots-txt-url-format

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!