Fuzzy matching of product names

前端 未结 11 1322
长发绾君心
长发绾君心 2020-12-12 16:28

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database.

For example \"

11条回答
  •  轮回少年
    2020-12-12 17:14

    Not having any experience with this type of problem, but I think a very naive implementation would be to tokenize the search term, and search for matches that happen to contain any of the tokens.

    "Canon PowerShot A20 IS", for example, tokenizes into:

    • Canon
    • Powershot
    • A20
    • IS

    which would match each of the other items you want to show up in the results. Of course, this strategy will likely produce a whole lot of false matches as well.

    Another strategy would be to store "keywords" with each item, such as "camera", "canon", "digital camera", and searching based on items that have matching keywords. In addition, if you stored other attributes such as Maker, Brand, etc., you could search on each of these.

提交回复
热议问题