Multiple User Agents in Robots.txt

后端未结

关注

 2  1846

In robots.txt file I have following sections

User-Agent: Bot1
Disallow: /A

User-Agent: Bot2
Disallow: /B

User-Agent: *
Disallow: /C

Will

相关标签:

2条回答

感情败类

2020-12-11 17:05
tl;dr: No, Bot1 and Bot2 will happily crawl paths starting with C.

Each bot only ever complies to at most a single record (block).

Original spec

In the original specification it says:

If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records.

Expired RFC draft

The original spec, including some additions (like Allow) became a draft for RFC, but never got accepted/published. In 3.2.1 The User-agent line it says:

The robot must obey the first record in /robots.txt that contains a User-Agent line whose value contains the name token of the robot as a substring. The name comparisons are case-insensitive. If no such record exists, it should obey the first record with a User-agent line with a "*" value, if present. If no record satisfied either condition, or no records are present at all, access is unlimited.

So it confirms the interpretation of the original spec.

Implementations

Google, for example, gives an example that seems to follow the spec:
Each section in the robots.txt file is separate and does not build upon previous sections. For example:
```
User-agent: *
Disallow: /folder1/

User-Agent: Googlebot
Disallow: /folder2/
```
In this example only the URLs matching /folder2/ would be disallowed for Googlebot.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2020-12-11 17:19

If the bots obey the robots.txt file, yes the statement will be visible, so they will not be able to crawl /c.

The wild card (*) after the user agent means all user agents.

However bear in mind not all bots obey robots.txt

0 讨论(0)
发布评论:

提交评论
- 加载中...

Multiple User Agents in Robots.txt

Original spec

Expired RFC draft

Implementations