Robots.txt: allow only major SE

前端 未结 4 970
不思量自难忘°
不思量自难忘° 2020-12-31 01:11

Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?

4条回答
  •  盖世英雄少女心
    2020-12-31 01:55

    Why?

    Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary.

    But — if you insist on doing it anyway — that's what the User-Agent: line in robots.txt is for.

    User-agent: googlebot
    Disallow: 
    
    User-agent: *
    Disallow: /
    

    With lines for all the other search engines you'd like traffic from, of course. Robotstxt.org has a partial list.

提交回复
热议问题