问题
I want my site to be indexed in search engines except few sub-directories. Following are my robots.txt settings:
robots.txt in the root directory
User-agent: *
Allow: /
Separate robots.txt in the sub-directory (to be excluded)
User-agent: *
Disallow: /
Is it the correct way or the root directory rule will override the sub-directory rule?
回答1:
No, this is wrong.
You can’t have a robots.txt in a sub-directory. Your robots.txt must be placed in the document root of your host.
If you want to disallow crawling of URLs whose paths begin with /foo, use this record in your robots.txt (http://example.com/robots.txt):
User-agent: *
Disallow: /foo
This allows crawling everything (so there is no need for Allow) except URLs like
http://example.com/foohttp://example.com/foo/http://example.com/foo.htmlhttp://example.com/foobarhttp://example.com/foo/bar- …
回答2:
Yes there are
User-agent: *
Disallow: /
The above directive is useful if you are developing a new website and do not want search engines to index your incomplete website. also,you can get advanced infos right here
回答3:
You can manage them with robots.txt which sits in the root directory. Make sure to have allow patterns before your disallow patterns.
来源:https://stackoverflow.com/questions/28495972/robots-txt-allow-all-except-few-sub-directories