how to support UTF8 (japanese, arabic, spanish, …) URL's in PHP

前端 未结 4 947
既然无缘
既然无缘 2020-12-15 14:32

For a web application, we need to link to some user generated content. A users types in a title for e.g. a product and we generate an SEO friendly url for that product:

相关标签:
4条回答
  • 2020-12-15 15:16

    You're in trouble I'm afraid. The encoding of the URL is at the discretion of the browser. I've encountered the same problem when trying to support URLs with Norwegian special characters and its simply not consistently possible.

    You may be able to redirect a browser to the UTF-8 URL, but it might reply to you in ISO. It gets even worse in some cases where browsers (firefox for instance) will mix ISO and UTF-8 formatting in the same url (this happens particularly with get parameters).

    My suggestion is simply; Don't do it, use either English (better SEO too!) or spell it phonetically.

    0 讨论(0)
  • 2020-12-15 15:16

    You should do urlencode the Arabic or unicode text

    urlencode('كلام-عربي')
    

    And its very important to add the charset code to the head tag of the page, otherwise the link will not work

    <meta charset="utf-8">
    
    0 讨论(0)
  • 2020-12-15 15:21

    You might need to use IDNA encoding on the non-ASCII portion of the URL.

    http://en.wikipedia.org/wiki/Internationalized_domain_name

    0 讨论(0)
  • 2020-12-15 15:22

    Although the URL itself only allows US-ASCII characters, you can use Unicode characters in the URI path if you encode them with UTF-8 and then convert them in US-ASCII characters by using the percent-encoding:

    A system that internally provides identifiers in the form of a different character encoding, such as EBCDIC, will generally perform character translation of textual identifiers to UTF-8 [STD63] (or some other superset of the US-ASCII character encoding) at an internal interface, thereby providing more meaningful identifiers than those resulting from simply percent-encoding the original octets.

    So you can do something like this (assuming UTF-8):

    $title = 'أبجد هوز';
    $path = '/product/'.rawurlencode($title);
    echo $path;  // "/product/%D8%A3%D8%A8%D8%AC%D8%AF%20%D9%87%D9%88%D8%B2"
    

    Although the URI path is actually encoded with the percent-encoding, most modern browsers will display the characters this sequence represents in Unicode when UTF-8 is used.

    0 讨论(0)
提交回复
热议问题