robots.txt | 易学教程

Is it possible to list multiple user-agents in one line?

阅读更多关于 Is it possible to list multiple user-agents in one line?

问题 Is it possible in robots.txt to give one instruction to multiple bots without repeatedly having to mention it? Example: User-agent: googlebot yahoobot microsoftbot Disallow: /boringstuff/ 回答1: It's actually pretty hard to give a definitive answer to this, as there isn't a very well-defined standard for robots.txt, and a lot of the documentation out there is vague or contradictory. The description of the format understood by Google's bots is quite comprehensive, and includes this slightly

Static files in Flask - robot.txt, sitemap.xml (mod_wsgi)

阅读更多关于 Static files in Flask - robot.txt, sitemap.xml (mod_wsgi)

Is there any clever solution to store static files in Flask's application root directory. robots.txt and sitemap.xml are expected to be found in /, so my idea was to create routes for them: @app.route('/sitemap.xml', methods=['GET']) def sitemap(): response = make_response(open('sitemap.xml').read()) response.headers["Content-type"] = "text/plain" return response There must be something more convenient :) The best way is to set static_url_path to root url from flask import Flask app = Flask(__name__, static_folder='static', static_url_path='') @vonPetrushev is right, in production you'll want

HTTP header to detect a preload request by Google Chrome

阅读更多关于 HTTP header to detect a preload request by Google Chrome

问题 Google Chrome 17 introduced a new feature which preloads a webpage to improve rendering speed upon actually making the request (hitting enter in the omnibar). Two questions: Is there a HTTP header to detect such a request on server side, and if one actually exists what is the proper response in order to prevent such preloading (to prevent unintended requests which might have unwanted effects)? Does Google Chrome check the robots.txt before making preload requests? Is there a robots.txt

order of directives in robots.txt, do they overwrite each other or complement each other?

阅读更多关于 order of directives in robots.txt, do they overwrite each other or complement each other?

问题 User-agent: Googlebot Disallow: /privatedir/ User-agent: * Disallow: / Now, what are disallowed for Googlebot: /privatedir/, or the whole website / ? 回答1: According to the original robots.txt specification: A bot must follow the first record that matches its user-agent name. If such a record doesn’t exist, it must follow the record with User-agent: * (this line may not appear in more than one record). If such a record doesn’t exist, it doesn’t have to follow any record. So a bot never follows

How do I disallow specific page from robots.txt

阅读更多关于 How do I disallow specific page from robots.txt

问题 I am creating two pages on my site that are very similar but serve different purposes. One is to thank users for leaving a comment and the other is to encourage users to subscribe. I don't want the duplicate content but I do want the pages to be available. Can I set the sitemap to hide one? Would I do this in the robots.txt file? The disallow looks like this: Disallow: /wp-admin How would I customize to the a specific page like: http://sweatingthebigstuff.com/thank-you-for-commenting 回答1:

Can a relative sitemap url be used in a robots.txt?

阅读更多关于 Can a relative sitemap url be used in a robots.txt?

问题 In robots.txt can I write the following relative URL for the sitemap file? sitemap: /sitemap.ashx Or do I have to use the complete (absolute) URL for the sitemap file, like: sitemap: http://subdomain.domain.com/sitemap.ashx Why I wonder: I own a new blog service, www.domain.com, that allow users to blog on accountname.domain.com. I use wildcards, so all subdomains (accounts) point to: "blog.domain.com". In blog.domain.com I put the robots.txt to let search engines find the sitemap. But, due

Robots.txt: Is this wildcard rule valid?

阅读更多关于 Robots.txt: Is this wildcard rule valid?

问题 Simple question. I want to add: Disallow */*details-print/ Basically, blocking rules in the form of /foo/bar/dynamic-details-print --- foo and bar in this example can also be totally dynamic. I thought this would be simple, but then on www.robotstxt.org there is this message: Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines

Static files in Flask - robot.txt, sitemap.xml (mod_wsgi)

阅读更多关于 Static files in Flask - robot.txt, sitemap.xml (mod_wsgi)

问题 Is there any clever solution to store static files in Flask\'s application root directory. robots.txt and sitemap.xml are expected to be found in /, so my idea was to create routes for them: @app.route(\'/sitemap.xml\', methods=[\'GET\']) def sitemap(): response = make_response(open(\'sitemap.xml\').read()) response.headers[\"Content-type\"] = \"text/plain\" return response There must be something more convenient :) 回答1: The best way is to set static_url_path to root url from flask import