Disallow dynamic URL in robots.txt

你离开我真会死。 提交于 2019-12-02 17:26:26

问题


Our URL is:

http://example.com/kitchen-knife/collection/maitre-universal-cutting-boards-rana-parsley-chopper-cheese-slicer-vegetables-knife-sharpening-stone-ham-stand-ham-stand-riviera-niza-knives-block-benin.html

I want to disallow URLs to be crawled after collection, but before collection there are categories that are dynamically coming.

How would I disallow URLs in robots.txt after /collection?


回答1:


This is not possible in the original robots.txt specification.

But some (!) parsers extend the specification and define a wildcard character (typically *).

For those parsers, you could use:

Disallow: /*/collection

Parsers that understand * as wildcard will stop crawling any URL whose path starts with anything (which may be nothing), followed by /collection/, followed by anything, e.g.,

http://example.com/foo/collection/
http://example.com/foo/collection/bar
http://example.com/collection/

Parsers that don’t understand * as wildcard (i.e., they follow the original specification) will stop crawling any URL whose paths starts with /*/collection/, e.g.

http://example.com/*/collection/
http://example.com/*/collection/bar


来源:https://stackoverflow.com/questions/30410260/disallow-dynamic-url-in-robots-txt

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!