robots.txt allow all except few sub-directories

点点圈 提交于 2019-12-05 07:52:28
unor

No, this is wrong.

You can’t have a robots.txt in a sub-directory. Your robots.txt must be placed in the document root of your host.

If you want to disallow crawling of URLs whose paths begin with /foo, use this record in your robots.txt (http://example.com/robots.txt):

User-agent: *
Disallow: /foo

This allows crawling everything (so there is no need for Allow) except URLs like

  • http://example.com/foo
  • http://example.com/foo/
  • http://example.com/foo.html
  • http://example.com/foobar
  • http://example.com/foo/bar
Ganga

Yes there are

User-agent: *
Disallow: /

The above directive is useful if you are developing a new website and do not want search engines to index your incomplete website. also,you can get advanced infos right here

You can manage them with robots.txt which sits in the root directory. Make sure to have allow patterns before your disallow patterns.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!