Is there any advantage of using X-Robot-Tag instead of robots.txt?

大城市里の小女人 提交于 2019-11-27 08:14:30

问题


It looks like there are two mainstream solutions for instructing crawlers what to index and what not to index: adding an X-Robot-Tag HTTP header, or indicating a robots.txt.

Is there any advantage to using the former?


回答1:


With robots.txt you cannot disallow indexing of your documents.

They have different purposes:

  • robots.txt can disallow crawling (with Disallow)
  • X-Robots-Tag ¹ can disallow indexing (with noindex)

(And both offer additional different features, e.g., linking to your Sitemap in robots.txt, disallowing following links in X-Robots-Tag, and many more.)

Crawling means accessing the document. Indexing means providing a link to (and possibly metadata from or about) the document in an index. In the typical case, a bot indexes a document after having crawled it, but that’s not necessary.

A bot that isn’t allowed to crawl a document may still index it (without ever accessing it). A bot that isn’t allowed to index a document may still crawl it. You can’t disallow both.

¹ Note that the header is called X-Robots-Tag, not X-Robot-Tag. By the way, the metadata name robots (for the HTML meta element) is an alternative to the HTTP header.



来源:https://stackoverflow.com/questions/35639040/is-there-any-advantage-of-using-x-robot-tag-instead-of-robots-txt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!