robots.txt

Multiple Sitemap: entries in robots.txt?

无人久伴 提交于 2019-12-03 06:30:32
问题 I have been searching around using Google but I can't find an answer to this question. A robots.txt file can contain the following line: Sitemap: http://www.mysite.com/sitemapindex.xml but is it possible to specify multiple sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps referenced in each sitemap index file? For example, will this work: Sitemap: http://www.mysite.com/sitemapindex1.xml Sitemap: http://www.mysite.com/sitemapindex2

Robots.txt: allow only major SE

纵饮孤独 提交于 2019-12-03 04:39:53
Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders? User-agent: * Disallow: / User-agent: Googlebot Allow: / User-agent: Slurp Allow: / User-Agent: msnbot Disallow: Slurp is Yahoo's robot Why? Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary. But — if you insist on doing it anyway — that's what the User-Agent: line in robots.txt is for. User-agent: googlebot Disallow: User-agent: * Disallow:

Sitemap for a site with a large number of dynamic subdomains

与世无争的帅哥 提交于 2019-12-03 02:52:20
I'm running a site which allows users to create subdomains. I'd like to submit these user subdomains to search engines via sitemaps. However, according to the sitemaps protocol (and Google Webmaster Tools), a single sitemap can include URLs from a single host only. What is the best approach? At the moment I've the following structure: Sitemap index located at example.com/sitemap-index.xml that lists sitemaps for each subdomain (but located at the same host). Each subdomain has its own sitemap located at example.com/sitemap-subdomain.xml (this way the sitemap index includes URLs from a single

Serving sitemap.xml and robots.txt with Spring MVC

孤者浪人 提交于 2019-12-03 02:25:54
问题 What is the best way to server sitemap.xml and robots.txt with Spring MVC ? I want server these files through Controller in cleanest way. 回答1: I'm relying on JAXB to generate the sitemap.xml for me. My controller looks something like the below, and I have some database tables to keep track of the links that I want to appear in the sitemap:- SitemapController.java @Controller public class SitemapController { @RequestMapping(value = "/sitemap.xml", method = RequestMethod.GET) @ResponseBody

robots.txt allow root only, disallow everything else?

a 夏天 提交于 2019-12-02 23:25:15
I can't seem to get this to work but it seems really basic. I want the domain root to be crawled http://www.example.com But nothing else to be crawled and all subdirectories are dynamic http://www.example.com/* I tried User-agent: * Allow: / Disallow: /*/ but the Google webmaster test tool says all subdirectories are allowed. Anyone have a solution for this? Thanks :) According to the Backus-Naur Form (BNF) parsing definitions in Google's robots.txt documentation , the order of the Allow and Disallow directives doesn't matter. So changing the order really won't help you. Instead you should use

Multiple Sitemap: entries in robots.txt?

一曲冷凌霜 提交于 2019-12-02 23:22:44
I have been searching around using Google but I can't find an answer to this question. A robots.txt file can contain the following line: Sitemap: http://www.mysite.com/sitemapindex.xml but is it possible to specify multiple sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps referenced in each sitemap index file? For example, will this work: Sitemap: http://www.mysite.com/sitemapindex1.xml Sitemap: http://www.mysite.com/sitemapindex2.xml Sitemap: http://www.mysite.com/sitemapindex3.xml It is possible to write them, but it is up to the

django serving robots.txt efficiently

隐身守侯 提交于 2019-12-02 21:50:34
Here is my current method of serving robots.txt url(r'^robots\.txt/$', TemplateView.as_view(template_name='robots.txt', content_type='text/plain')), I don't think that this is the best way. I think it would be better if it were just a pure static resource and served statically. But the way my django app is structured is that the static root and all subsequent static files are located in http://my.domain.com/static/stuff-here Any thoughts? I'm amateur at django but TemplateView.as_view(template_name='robots.txt', content_type='text/plain') looks a lot more resource consuming than just a static

Wordpress remove Robots Meta Tag noindex

Deadly 提交于 2019-12-02 20:14:15
问题 were experiencing a strange issue with a wordpress sites meta robots tag. All pages have the following meta tag and we cant seem to remove it <meta name="robots" content="noindex,follow"/> We have unchecked "Discourage search engines from indexing this site" in Settings > Reading > Search Engine Visibility but it does nothing. We are using the Yoast SEO plugin but even when this is disabled the still remains. In fact, we have tried disabling all plugins to check nothing was interfering with

Disallow dynamic URL in robots.txt

你离开我真会死。 提交于 2019-12-02 17:26:26
问题 Our URL is: http://example.com/kitchen-knife/collection/maitre-universal-cutting-boards-rana-parsley-chopper-cheese-slicer-vegetables-knife-sharpening-stone-ham-stand-ham-stand-riviera-niza-knives-block-benin.html I want to disallow URLs to be crawled after collection , but before collection there are categories that are dynamically coming. How would I disallow URLs in robots.txt after /collection ? 回答1: This is not possible in the original robots.txt specification. But some (!) parsers

Serving sitemap.xml and robots.txt with Spring MVC

半城伤御伤魂 提交于 2019-12-02 15:56:37
What is the best way to server sitemap.xml and robots.txt with Spring MVC ? I want server these files through Controller in cleanest way. I'm relying on JAXB to generate the sitemap.xml for me. My controller looks something like the below, and I have some database tables to keep track of the links that I want to appear in the sitemap:- SitemapController.java @Controller public class SitemapController { @RequestMapping(value = "/sitemap.xml", method = RequestMethod.GET) @ResponseBody public XmlUrlSet main() { XmlUrlSet xmlUrlSet = new XmlUrlSet(); create(xmlUrlSet, "", XmlUrl.Priority.HIGH);