Duplicate elimination of similar company names
问题 I have a table with company names. There are many duplicates because of human input errors. There are different perceptions if the subdivision should be included, typos, etc. I want all these duplicates to be marked as one company "1c": +------------------+ | company | +------------------+ | 1c | | 1c company | | 1c game studios | | 1c wireless | | 1c-avalon | | 1c-softclub | | 1c: maddox games | | 1c:inoco | | 1cc games | +------------------+ I identified Levenshtein distance as a good way