Fuzzy matching of product names

前端 未结 11 1339
长发绾君心
长发绾君心 2020-12-12 16:28

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database.

For example \"

11条回答
  •  佛祖请我去吃肉
    2020-12-12 17:13

    This is a problem of record linkage. The dedupe python library provides a complete implementation, but even if you don't use python, the documentation has a good overview of how to approach this problem.

    Briefly, within the standard paradigm, this task is broken into three stages

    1. Compare the fields, in this case just the name. You can use one or more comparator for this, for example an edit distance like the Levenshtein distance or something like the cosine distance that compares the number of common words.
    2. Turn an array fo distance scores into a probability that a pair of records are truly about the same thing
    3. Cluster those pairwise probability scores into groups of records that likely all refer to the same thing.

提交回复
热议问题