robots.txt | 易学教程

How to create a robots.txt file to hide a view page from search engines in codeigniter

阅读更多关于 How to create a robots.txt file to hide a view page from search engines in codeigniter

问题 How to create a robots.txt file in a codeigniter project to hide a view page . where should i place this robots.txt file currently i have created file like this User-agent: * Disallow: /application/views/myviewpage.php in side /public_html/folder/robots.txt (Where i place my .htaccess file). Is there any way to test this? 回答1: The robots.txt file MUST be placed in the document root of the host. It won’t work in other locations. If your host is example.com , it needs to be accessible at http:/

How to configure robots.txt to allow everything?

阅读更多关于 How to configure robots.txt to allow everything?

My robots.txt in Google Webmaster Tools shows the following values: User-agent: * Allow: / What does it mean? I don't have enough knowledge about it, so looking for your help. I want to allow all robots to crawl my website, is this the right configuration? That file will allow all crawlers access User-agent: * Allow: / This basically allows all user agents (the *) to all parts of the site (the /). If you want to allow every bot to crawl everything, this is the best way to specify it in your robots.txt: User-agent: * Disallow: Note that the Disallow field has an empty value, which means

Robots.txt restriction of category URLs

阅读更多关于 Robots.txt restriction of category URLs

I was unable to find information about my case. I want to restrict the following types of URLs to be indexed: website.com/video-title/video-title/ (my website produces such double URL copies of my video-articles) Each video article starts with the word "video" in the beginning of its URL. So what I want to do is to restrict all URLs that have website.com/"any-url"/video-any-url" This way I will remove all the doubled copies. Could somebody help me? This is not possible in the original robots.txt specification. But some parsers may support wildcards in Disallow anyway, for example, Google :

Will this robots.txt only allow googlebot to index my site?

阅读更多关于 Will this robots.txt only allow googlebot to index my site?

Will this robots.txt file only allow googlebot to index my site's index.php file? CAVEAT, I have an htaccess redirect that people who type in http://www.example.com/index.php are redirected to simply http://www.example.com/ So, this is my robots.txt file content... User-agent: Googlebot Allow: /index.php Disallow: / User-agent: * Disallow: / Thanks in advance! Not really. Good bots Only "good" bots follow the robots.txt instructions (not all robots and spiders bother to read/follow robots.txt ). That might not even include all the main search engine's bots, but it definitely mean that some web

Disallow directory contents, but Allow directory page in robots.txt

阅读更多关于 Disallow directory contents, but Allow directory page in robots.txt

问题 Will this work for disallowing pages under a directory, but still allow a page on that directory url? Allow: /special-offers/$ Disallow: /special-offers/ to allow: www.mysite.com/special-offers/ but block: www.mysite.com/special-offers/page1 www.mysite.com/special-offers/page2.html etc 回答1: Having looked at Google's very own robots.txt file, they are doing exactly what I was questioning. At line 136-137 they have: Disallow: /places/ Allow: /places/$ So they are blocking any thing under places

robots.txt - is this working?

阅读更多关于 robots.txt - is this working?

问题 I just ran into a robots.txt that looks like this: User-agent: * Disallow: /foobar User-agent: badbot Disallow: * After disallowing only a few folders for all, does the specific badbot rule even apply? Note: This question is merely for understanding the above ruleset. I know using robots.txt is not a proper security mechanism and I'm neither using nor advocating it. 回答1: Each bot only ever complies to at most a single record (block). A block starts with one or more User-agent lines, typically

robots.txt - is this working?

阅读更多关于 robots.txt - is this working?

I just ran into a robots.txt that looks like this: User-agent: * Disallow: /foobar User-agent: badbot Disallow: * After disallowing only a few folders for all, does the specific badbot rule even apply? Note: This question is merely for understanding the above ruleset. I know using robots.txt is not a proper security mechanism and I'm neither using nor advocating it. unor Each bot only ever complies to at most a single record (block). A block starts with one or more User-agent lines, typically followed by Disallow lines ( at least one is required ). Blocks are separated by blank lines. A bot

May Disallow entire website on robots.txt have consequences after removal?

阅读更多关于 May Disallow entire website on robots.txt have consequences after removal?

I've published a website and, due to a misunderstanding not depending on me, I had to block all the pages before indexing. Some of these pages had been already linked on social networks, so to avoid a bad user-experience I've decided to insert the following code into "robots.txt" User-agent: * Disallow: * I've received a "critical problem" alert on webmaster tools and I'm a bit worried about it. In your experience, would it be sufficient (whenever possible) to restore the original "robots.txt"? May the current situation leave consequences (penalizations or similar) on the website if it lasts

Stop abusive bots from crawling?

阅读更多关于 Stop abusive bots from crawling?

问题 Is this a good idea?? http://browsers.garykeith.com/stream.asp?RobotsTXT What does abusive crawling mean? How is that bad for my site? 回答1: Not really. Most "bad bots" ignore the robots.txt file anyway. Abuse crawling usually means scraping. These bots are showing up to harvest email addresses or more commonly, content. As to how you can stop them? That's really tricky and often not wise. Anti-crawl techniques have a tendency to be less than perfect and cause problems for regular humans.

Stop abusive bots from crawling?

阅读更多关于 Stop abusive bots from crawling?

Is this a good idea?? http://browsers.garykeith.com/stream.asp?RobotsTXT What does abusive crawling mean? How is that bad for my site? Not really. Most "bad bots" ignore the robots.txt file anyway. Abuse crawling usually means scraping. These bots are showing up to harvest email addresses or more commonly, content. As to how you can stop them? That's really tricky and often not wise. Anti-crawl techniques have a tendency to be less than perfect and cause problems for regular humans. Sadly, like "shrinkage" in retail, it's a cost of doing business on the web. A user-agent (which includes