String slugification in Python

前端未结

关注

 10  1081

粉色の甜心

I am in search of the best way to \"slugify\" string what \"slug\" is, and my current solution is based on this recipe

I have changed it a little bit to:

相关标签:

10条回答

借酒劲吻你

2020-12-01 00:00

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-12-01 00:03
The problem is with the ascii normalization line:
```
slug = unicodedata.normalize('NFKD', s)
```
It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:
```
Mørdag -> mrdag
Æther -> ther
```
A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:
```
import unidecode
slug = unidecode.unidecode(s)
```
You get better results for the above strings and for many Greek and Russian characters too:
```
Mørdag -> mordag
Æther -> aether
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

死守一世寂寞

2020-12-01 00:04

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
    text = unidecode.unidecode(text).lower()
    return re.sub(r'[\W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

0 讨论(0)

陌清茗

2020-12-01 00:07

def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text This should suffice your requirement.

0 讨论(0)

上一页 1 2