Robots.txt deny, for a #! URL

前端 未结 2 1243
刺人心
刺人心 2020-12-21 15:01

I am trying to add a deny rule to a robots.txt file, to deny access to a single page.

The website URLs work as follows:

  • http://example.com/#!/homepage<
相关标签:
2条回答
  • 2020-12-21 15:41

    You can't (per se). Search engines wouldn't run JavaScript anyway, so will generally ignore the fragment identifier. You can only deny the URLs that would be requested from the server (which are without fragment identifiers).

    Google will map hashbangs onto different URIs and you can figure out what those are (and you should have done already because that is the point of using hash bangs) and put them in robots.txt.

    Hash bangs, however, are problematic at best, so I'd scrap them in favour of using the history API which allows you to use sane URIs.

    0 讨论(0)
  • 2020-12-21 15:44

    You can actually do this multiple ways, but here are the two simplest.

    You have to exclude the URLs that Googlebot is going to fetch, which isn't the AJAX hashbang values, but the instead the translated ?_escaped_fragment_=key=value

    In your robots.txt file specify:

    Disallow: /?_escaped_fragment_=/super-secret
    Disallow: /index.php?_escaped_fragment_=/super-secret
    

    When in doubt, you should always use the Google Webmaster Tool » "Fetch As Googlebot".

    If the page has already been indexed by Googlebot, using a robots.txt file won't remove it from the index. You'll either have to use the Google Webmaster Tools URL removal tool after you apply the robots.txt, or instead you can add a noindex command to the page via a <meta> tag or X-Robots-Tag in the HTTP Headers.

    It would look something like:

    <meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />
    

    or

    X-Robots-Tag: noindex
    
    0 讨论(0)
提交回复
热议问题