robots.txt | 易学教程

How to add route to dynamic robots.txt in ASP.NET MVC?

阅读更多关于 How to add route to dynamic robots.txt in ASP.NET MVC?

问题 I have a robots.txt that is not static but generated dynamically. My problem is creating a route from root/robots.txt to my controller action. This works : routes.MapRoute( name: "Robots", url: "robots", defaults: new { controller = "Home", action = "Robots" }); This doesn't work : routes.MapRoute( name: "Robots", url: "robots.txt", /* this is the only thing I've changed */ defaults: new { controller = "Home", action = "Robots" }); The ".txt" causes ASP to barf apparently 回答1: You need to add

Robots.txt Allow sub folder but not the parent

阅读更多关于 Robots.txt Allow sub folder but not the parent

问题 Can anybody please explain the correct robots.txt command for the following scenario. I would like to allow access to: /directory/subdirectory/.. But I would also like to restrict access to /directory/ not withstanding the above exception. 回答1: Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt According to a Google groups post, the following works at least with GoogleBot; User-agent: Googlebot Disallow: /directory/ Allow: /directory

Facebook and Crawl-delay in Robots.txt?

阅读更多关于 Facebook and Crawl-delay in Robots.txt?

问题 Does Facebook's webcrawling bots respect the Crawl-delay: directive in robots.txt files? 回答1: We don't have a crawler. We have a scraper that scrapes meta data on pages that have like buttons/are shared on FB. 回答2: No, it doesn't respect robots.txt Contrary to other answers here, facebookexternalhit behaves like the meanest of crawlers. Whether it got the urls it requests from crawling or from like buttons doesn't matter so much when it goes through every one of those at an insane rate. We

Is the User-Agent line in robots.txt an exact match or a substring match?

阅读更多关于 Is the User-Agent line in robots.txt an exact match or a substring match?

问题 When a crawler reads the User-Agent line of a robots.txt file, does it attempt to match it exactly to its own User-Agent or does it attempt to match it as a substring of its User-Agent? Everything I have read does not explicitly answer this question. According to another StackOverflow thread it is an exact match. However, the RFC draft makes me believe that it is a substring match. For example, User-Agent: Google will match "Googlebot" and "Googlebot-News". Here is the relevant quotation from

How may i prevent search engines from crawling a subdomain on my website?

阅读更多关于 How may i prevent search engines from crawling a subdomain on my website?

问题 I have cPanel installed on my website. I went to the Domains section on cPanel I clicked on subdomains. I assigned the subdomain name (e.g : personal.mywebsite.com ) It wanted me to assign document root folder also. I assigned mywebsite.com/personal if i create robots.txt in my website root(e.g : website.com) User-agent: Disallow: /personal/ Can it also block personal.mywebsite.com? what should i do? thanks 回答1: When you want to block URLs on personal.example.com , visit http://personal

How robots.txt file should be properly written for subdomains?

阅读更多关于 How robots.txt file should be properly written for subdomains?

问题 Can someone explain me how should i write a robots.txt file if i want that all crawlers index root and some specific subdomains User-agent: * Allow: / Allow: /subdomain1/ Allow: /subdomain2/ Is this right? And where should i put it? In the root (public_html) folder or in each subdomain folder? 回答1: There is no way to specify rules for different subdomains within a single robots.txt file. A given robots.txt file will only control crawling of the subdomain it was requested from. If you want to

How robots.txt file should be properly written for subdomains?

阅读更多关于 How robots.txt file should be properly written for subdomains?

Implementing “Report this content” and detecting spammer or robot triggered event

阅读更多关于 Implementing “Report this content” and detecting spammer or robot triggered event

问题 I'm creating a forum for a website, and plan on implementing a "Report this content" function. In all honesty, I'm not sure how useful (lit. necessary) the feature will be, since a user account (created by admin) will be required for posting, but the solution interests me. So in short, this is the scenario: For all users, there will be read-only access to all (non-restricted) content on the forum. For unidentified users there will be a reply button and report this content button present. The

remove pages from google dynamic url - robots.txt

阅读更多关于 remove pages from google dynamic url - robots.txt

问题 I have a few links on google that are domain.com/results.php?name=a&address=b The results page/parameters has now been renamed and I need to remove the existing links on google etc. I tried User-agent: * Disallow: /results.php in robots.txt and then on google webmaster added the url to be removed: domain.com/results.php it says it was removed successfully, however when I look at google an type domain.com - the existing urls with parameters are all still there. What am I doing wrong? There are

Google is ignoring my robots.txt [closed]

阅读更多关于 Google is ignoring my robots.txt [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Here is content of my robots.txt file: User-agent: * Disallow: /images/ Disallow: /upload/ Disallow: /admin/ As you can see, I explicitly disallowed all robots to index the folders images , upload and admin . The problem is that one of my clients sent request for removing the content from the images folder