String slugification in Python

前端 未结 10 1075
粉色の甜心
粉色の甜心 2020-11-30 23:10

I am in search of the best way to \"slugify\" string what \"slug\" is, and my current solution is based on this recipe

I have changed it a little bit to:

<         


        
相关标签:
10条回答
  • 2020-12-01 00:00

    Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

    0 讨论(0)
  • 2020-12-01 00:03

    The problem is with the ascii normalization line:

    slug = unicodedata.normalize('NFKD', s)
    

    It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

    Mørdag -> mrdag
    Æther -> ther
    

    A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

    import unidecode
    slug = unidecode.unidecode(s)
    

    You get better results for the above strings and for many Greek and Russian characters too:

    Mørdag -> mordag
    Æther -> aether
    
    0 讨论(0)
  • 2020-12-01 00:04

    Install unidecode form from here for unicode support

    pip install unidecode

    # -*- coding: utf-8 -*-
    import re
    import unidecode
    
    def slugify(text):
        text = unidecode.unidecode(text).lower()
        return re.sub(r'[\W_]+', '-', text)
    
    text = u"My custom хелло ворлд"
    print slugify(text)
    

    >>> my-custom-khello-vorld

    0 讨论(0)
  • 2020-12-01 00:07
    def slugify(value):
        """
        Converts to lowercase, removes non-word characters (alphanumerics and
        underscores) and converts spaces to hyphens. Also strips leading and
        trailing whitespace.
        """
        value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
        value = re.sub('[^\w\s-]', '', value).strip().lower()
        return mark_safe(re.sub('[-\s]+', '-', value))
    slugify = allow_lazy(slugify, six.text_type)
    

    This is the slugify function present in django.utils.text This should suffice your requirement.

    0 讨论(0)
提交回复
热议问题