I want my site to be indexed in search engines except few sub-directories. Following are my robots.txt
settings:
robots.txt
in the root directory
User-agent: *
Allow: /
Separate robots.txt
in the sub-directory (to be excluded)
User-agent: *
Disallow: /
Is it the correct way or the root directory rule will override the sub-directory rule?
No, this is wrong.
You can’t have a robots.txt in a sub-directory. Your robots.txt must be placed in the document root of your host.
If you want to disallow crawling of URLs whose paths begin with /foo
, use this record in your robots.txt (http://example.com/robots.txt
):
User-agent: *
Disallow: /foo
This allows crawling everything (so there is no need for Allow
) except URLs like
http://example.com/foo
http://example.com/foo/
http://example.com/foo.html
http://example.com/foobar
http://example.com/foo/bar
- …
Yes there are
User-agent: *
Disallow: /
The above directive is useful if you are developing a new website and do not want search engines to index your incomplete website. also,you can get advanced infos right here
You can manage them with robots.txt which sits in the root directory. Make sure to have allow patterns before your disallow patterns.
来源:https://stackoverflow.com/questions/28495972/robots-txt-allow-all-except-few-sub-directories