Block all bots/crawlers/spiders for a special directory with htaccess

老子叫甜甜 提交于 2019-11-26 18:59:56

You need to have mod_rewrite enabled. Placed it in .htaccess in that folder. If placed elsewhere (e.g. parent folder) then RewriteRule pattern need to be slightly modified to include that folder name).

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]
  1. I have entered only few bots -- you add any other yourself (letter case does not matter).
  2. This rule will respond with "403 Access Forbidden" result code for such requests. You can change to another response HTTP code if you really want (403 is most appropriate here considering your requirements).

Why use .htaccess or mod_rewrite for a job that is specifically meant for robots.txt? Here is the robots.txt snippet you will need t block a specific set of directories.

User-agent: *
Disallow: /subdir1/
Disallow: /subdir2/
Disallow: /subdir3/

This will block all search bots in directories /subdir1/, /subdir2/ and /subdir3/.

For more explanation see here: http://www.robotstxt.org/orig.html

I Know the topic is "old" but still, for ppl who landed here also (as I also did), you could look here great 5g blacklist 2013.
It's a great help and NO not only for wordpress but also for all other sites. Works awesome imho.
Another one which is worth looking at could be Linux reviews anti spam through .htaccess

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!