I was unable to find information about my case. I want to restrict the following types of URLs to be indexed:
website.com/video-title/video-title/
(my website produces such double URL copies of my video-articles)
Each video article starts with the word "video" in the beginning of its URL.
So what I want to do is to restrict all URLs that have website.com/"any-url"/video-any-url"
This way I will remove all the doubled copies. Could somebody help me?
This is not possible in the original robots.txt specification.
But some parsers may support wildcards in Disallow anyway, for example, Google:
Googlebot (but not all search engines) respects some pattern matching.
So for Google’s bots, you could use the following line:
Disallow: /*/video
This should block any URLs whose paths starts with anything, and contains "video", for example:
/foo/video/foo/videos/foo/video.html/foo/video/bar/foo/bar/videos/foo/bar/foo/bar/videos
Other parsers not supporting this would interpret it literally, i.e., they would block the following URLs:
/*/video/*/videos/*/video/foo
来源:https://stackoverflow.com/questions/21734781/robots-txt-restriction-of-category-urls