问题
In this webpage:
http://www.alvolante.it/news/pompe_benzina_%E2%80%9Ctruccate%E2%80%9D_autostrada-308391044
there is this image:
http://immagini.alvolante.it/sites/default/files/imagecache/anteprima_100/images/rifornimento_benzina.jpg
Why this image is indexed if in the robots.txt there is "Disallow: /sites/" ??
You can see that is indexed from this search:
http://www.google.it/images?q=rifornimento+benzina&um=1&ie=UTF-8&source=og&sa=N&hl=it&tab=wi&biw=1280&bih=712
回答1:
Because of the different domain names (actually a domain and a subdomain): the page is from http://www.alvolante.it and the image is from http://immagini.alvolante.it.
Robots.txt is only in the www domain. If the file would be also in http://immagini.alvolante.it/ the Google wouldn't indexed the image.
Try to access http://immagini.alvolante.it/sites and http://www.alvolante.it/sites and you will see different pages.
回答2:
With google WebMaster Tools you can test your robots.txt.
http://www.google.com/webmasters/
回答3:
Have you disallowed all bots, or is this rule just for the Googlebot? If it's the latter, you need to ensure that you also include the rule for the 'Googlebot-Image' user agent.
来源:https://stackoverflow.com/questions/3862702/why-google-index-this