I am creating two pages on my site that are very similar but serve different purposes. One is to thank users for leaving a comment and the other is to encourage users to subscribe.
I don't want the duplicate content but I do want the pages to be available. Can I set the sitemap to hide one? Would I do this in the robots.txt file?
The disallow looks like this:
Disallow: /wp-admin
How would I customize to the a specific page like:
Disallow: /thank-you-for-commenting
in robots.txt
Take a look at last.fm robots.txt file for inspiration.
robots.txt files use regular expressions to match pages, so to avoid targeting more pages than you intend, you may need to add a $ to the end of the page name:
Disallow: /thank-you-for-commenting$
If you don't you'll also disallow page /thank-you-for-commenting-on-this-too
You can also add a specific page with extension in robots.txt file. In case of testing, you can specify the test page path to disallow robots from crawling.
For examples:
Disallow: /index_test.php
Disallow: /products/test_product.html
Disallow: /products/
The first one Disallow: /index_test.php will disallow bots from crawling the test page in root folder.
Second Disallow: /products/test_product.html will disallow test_product.html under the folder 'products'.
Finally the last example Disallow: /products/ will disallow the whole folder from crawling.
This is very simple, any page that you want to disallow, just give root url of this file or folder. Just put this into your robots.txt file.
Disallow: /thank-you-for-commenting
来源:https://stackoverflow.com/questions/3486458/how-do-i-disallow-specific-page-from-robots-txt