How to make Django slugify work properly with Unicode strings?

后端 未结 8 1860
猫巷女王i
猫巷女王i 2020-11-28 19:58

What can I do to prevent slugify filter from stripping out non-ASCII alphanumeric characters? (I\'m using Django 1.0.2)

cnprog.com has Chinese character

8条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-28 20:14

    I'm afraid django's definition of slug means ascii, though the django docs don't explicitly state this. This is the source of the defaultfilters for the slugify... you can see that the values are being converted to ascii, with the 'ignore' option in case of errors:

    import unicodedata
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
    value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())
    return mark_safe(re.sub('[-\s]+', '-', value))
    

    Based on that, I'd guess that cnprog.com is not using an official slugify function. You may wish to adapt the django snippet above if you want a different behaviour.

    Having said that, though, the RFC for URLs does state that non-us-ascii characters (or, more specifically, anything other than the alphanumerics and $-_.+!*'()) should be encoded using the %hex notation. If you look at the actual raw GET request that your browser sends (say, using Firebug), you'll see that the chinese characters are in fact encoded before being sent... the browser just makes it look pretty in the display. I suspect this is why slugify insists on ascii only, fwiw.

提交回复
热议问题