Robots.txt deny, for a #! URL

前端未结

关注

 2  1248

刺人心

I am trying to add a deny rule to a robots.txt file, to deny access to a single page.

The website URLs work as follows:

http://example.com/#!/homepage<

相关标签:

2条回答

我在风中等你

2020-12-21 15:41

You can't (per se). Search engines wouldn't run JavaScript anyway, so will generally ignore the fragment identifier. You can only deny the URLs that would be requested from the server (which are without fragment identifiers).

Google will map hashbangs onto different URIs and you can figure out what those are (and you should have done already because that is the point of using hash bangs) and put them in robots.txt.

Hash bangs, however, are problematic at best, so I'd scrap them in favour of using the history API which allows you to use sane URIs.

0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-12-21 15:44
You can actually do this multiple ways, but here are the two simplest.

You have to exclude the URLs that Googlebot is going to fetch, which isn't the AJAX hashbang values, but the instead the translated ?_escaped_fragment_=key=value

In your robots.txt file specify:
```
Disallow: /?_escaped_fragment_=/super-secret
Disallow: /index.php?_escaped_fragment_=/super-secret
```
When in doubt, you should always use the Google Webmaster Tool » "Fetch As Googlebot".

If the page has already been indexed by Googlebot, using a robots.txt file won't remove it from the index. You'll either have to use the Google Webmaster Tools URL removal tool after you apply the robots.txt, or instead you can add a noindex command to the page via a <meta> tag or X-Robots-Tag in the HTTP Headers.

It would look something like:
```
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />
```
or
```
X-Robots-Tag: noindex
```
0 讨论(0)
发布评论:

提交评论
- 加载中...