Converting UTF8 text for use in a URL

半腔热情 提交于 2019-12-07 07:27:10

问题


I'm developing an international site which uses UTF8 to display non english characters. I'm also using friendly URLS which contain the item name. Obviously I can't use the non english characters in the URL.

Is there some sort of common practice for this conversion? I'm not sure which english characters i should be replacing them with. Some are quite obvious (like è to e) but other characters I am not familiar with (such as ß).


回答1:


I normally use iconv() with the 'ASCII//TRANSLIT' option. This takes input like:

último año

and produces output like:

'ultimo a~no

Then I use preg_replace() to replace white spaces with dashes:

'ultimo-a~no

... and remove unwanted chars, e.g.

[^a-z0-9-]

It's probably useless with Arabic or Chinese but it works fine with Spanish, French or German.




回答2:


You can use UTF-8 encoded data in URL paths. You just need to encoded it additionally with the Percent encoding (see rawurlencode):

// ß (U+00DF) = 0xC39F (UTF-8)
$str = "\xC3\x9F";
echo '<a href="http://en.wikipedia.org/wiki/'.rawurlencode($str).'">'.$str.'</a>';

This will echo a link to http://en.wikipedia.org/wiki/ß. Modern browsers will display the character ß itself in the location bar instead of the percentage encoded representation of that character in UTF-8 (%C3%9F).

If you don’t want to use UTF-8 but only ASCII characters, I suggest to use transliteration like Álvaro G. Vicario suggested.




回答3:


Obviously I can't use the non english characters in the URL.

In fact, you can. The Wikipedia software (built in PHP) supports this, e.g. en.wikipedia.org/wiki/☃.

Notice that you need to encode the URL appropriately, as shown in the other answers.




回答4:


Use rawurlencode to encode your name for the URL, and rawurldecode to convert the name in the URL back to the original string. These two functions convert strings to and from URLs in compliance with RFC 1738.




回答5:


Last time I tried (about a week ago), UTF-8 (specifically japanese) characters worked fine in URLs without any additional encoding. Even looked right in address bars across all browsers I tested with (Safari, Chrome and Firefox, all on Mac) and I have no idea what browser my girlfriend was using on windows. Aside from most windows installations i've run across just showing squares for japanese characters because they lack the required fonts to display them, it seems to work fine there as well.

The URL I tried is: http://www.webghoul.de.private-void.net/cache/black-f-with-あい-50.png (WMD does not seem to like it)

Proof by screenshot http://heavymetal.theredhead.nl/~kris/stackoverflow/screenshot-utf8-url.png

So it might not actually be allowed by the spec, for what i've seen it works well across the board, except maybe in editors that like the spec a lot ;-)

I wouldn't actually recommend using these types of characters in URLs, but I also wouldn't make it a first priority to "fix".



来源:https://stackoverflow.com/questions/2419075/converting-utf8-text-for-use-in-a-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!