UTF-8 encoding in page addresses, issues with search engine crawlers

痞子三分冷 提交于 2019-12-23 21:23:02

问题


We are maintaining a website that uses the letters æ, ø, and å in some of the page addresses. And this has worked just fine, except for some IE-issues early on, up until now. The problem we have gotten this last couple of weeks is that search engine crawlers, especially Bing, seem to be encoding the letters over and over.

So we get 404-errors as the crawler is trying to access the address /butikk/m%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A3%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%82%C2%A0%C3%83%C6%92%C3%82%C2%A2%C3%83%C2%A2%C3%A2%E2%82%AC%C5%A1%C3%82%C2%AC%C3%83%C2%A2%C3%A2%E2%82%AC%C5%BE%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%85%C2%A1%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%B8bler, instead of /butikk/møbler. Using /butikk/m%c3%b8bler would also have gotten you to the right page. And as we are using Play Framework, we also get a site error as our controllers can be no longer than 250 characters, but that is not the real issue here.

Initially, there was no sitemap on the site. We added one, with UTF-8 encoded addresses, hoping this would lead the bots the right way, but so far nothing.

So has anybody had some similar issue and solved it, or have some suggestions in what we can do to make Bing Bot use the right addresses? Any help would be appreciated.

Added info: Having a look at Bing Webmaster Tools, I can see that Bing have both indexed the right address, and a version with "ø" instead of "ø". So my issue can hopefully be solved by removing the faulty address from the index.


回答1:


The best suggestion would be to leave out special characters out of your filenames/links/adresses. I've had a similar issue a few years back with links containing ä, ö, ü, which was resolved by simple removing the special characters and replacing them with standard UTF-8 characters.



来源:https://stackoverflow.com/questions/18953759/utf-8-encoding-in-page-addresses-issues-with-search-engine-crawlers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!