Is there a way to filter a django queryset based on string similarity (a la python difflib)?

后端 未结 3 1467
陌清茗
陌清茗 2020-12-31 18:17

I have a need to match cold leads against a database of our clients.

The leads come from a third party provider in bulk (thousands of records) and sales is asking u

3条回答
  •  滥情空心
    2020-12-31 18:42

    If you need getting there with django and postgres and don't want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:

    Extension needed fuzzystrmatch

    you need adding postgres extension to your db in psql:

    CREATE EXTENSION fuzzystrmatch;
    

    Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):

    from django.db.models import Func
    
    class Levenshtein(Func):
        template = "%(function)s(%(expressions)s, '%(search_term)s')"
        function = "levenshtein"
    
        def __init__(self, expression, search_term, **extras):
            super(Levenshtein, self).__init__(
                expression,
                search_term=search_term,
                **extras
            )
    

    then in any other place in project we just import defined Levenshtein and F to pass the django field.

    from django.db.models import F
    
    Spot.objects.annotate(
        lev_dist=Levenshtein(F('name'), 'Kfaka')
    ).filter(
        lev_dist__lte=2
    )
    

提交回复
热议问题