robots.txt allow root only, disallow everything else?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-03 08:12:46

问题


I can't seem to get this to work but it seems really basic.

I want the domain root to be crawled

http://www.example.com

But nothing else to be crawled and all subdirectories are dynamic

http://www.example.com/*

I tried

User-agent: *
Allow: /
Disallow: /*/

but the Google webmaster test tool says all subdirectories are allowed.

Anyone have a solution for this? Thanks :)


回答1:


According to the Backus-Naur Form (BNF) parsing definitions in Google's robots.txt documentation, the order of the Allow and Disallow directives doesn't matter. So changing the order really won't help you.

Instead you should use the $ operator to indicate the closing of your path.

Test this robots.txt. I'm certain it should work for you (I've also verified in Google Search Console):

user-agent: *
Allow: /$
Disallow: /

This will allow http://www.example.com and http://www.example.com/ to be crawled but everything else blocked.

note: that the Allow directive satisfies your particular use case, but if you have index.html or default.php, these URLs will not be crawled.

side note: I'm only really familiar with Googlebot and bingbot behaviors. If there are any other engines you are targeting, they may or may not have specific rules on how the directives are listed out. So if you want to be "extra" sure, you can always swap the positions of the Allow and Disallow directive blocks, I just set them that way to debunk some of the comments.




回答2:


When you look at the google robots.txt specifications, you can see that:

Google, Bing, Yahoo, and Ask support a limited form of "wildcards" for path values. These are:

  1. * designates 0 or more instances of any valid character
  2. $ designates the end of the URL

see https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt?hl=en#example-path-matches

Then as eywu said, the solution is

user-agent: *
Allow: /$
Disallow: /


来源:https://stackoverflow.com/questions/7226432/robots-txt-allow-root-only-disallow-everything-else

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!