Regexp for robots.txt

只愿长相守 提交于 2019-12-01 16:03:52

问题


I am trying to set up my robots.txt, but I am not sure about the regexps.

I've got four different pages all available in three different languages. Instead of listing each page times 3, I figured I could use a regexp.

nav.aspx
page.aspx/changelang (might have a query string attached such as "?toLang=fr".)
mypage.aspx?id and
login.aspx/logoff (=12346?... etc - different each time)

! All four in 3 different languages, e.g:

www.example.com/es/nav.aspx
www.example.com/it/nav.aspx
www.example.com/fr/nav.aspx

Now, my question is: Is the following regexp correct?

User-Agent: *
Disallow: /*nav\.aspx$
Disallow: /*page.aspx/changelang
Disallow: /*mypage\.aspx?id
Disallow: /*login\.aspx\/logoff

Thanks


回答1:


Regular Expressions are not allowed in robots.txt, but Googlebot (and some other robots) can understands some simple pattern matching:

Your robots.txt should look like this:

User-agent: *
Disallow: /*nav.aspx$
Disallow: /*page.aspx/changelang
Disallow: /*mypage.aspx?id
Disallow: /*login.aspx/logoff

User-agent directive is valid with lower case a. You don't have to escape . or `/'.

You can read more about this here: Block or remove pages using a robots.txt file



来源:https://stackoverflow.com/questions/6306763/regexp-for-robots-txt

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!