Sql Server what to do to make collation key from a string value

好久不见. 提交于 2019-12-08 09:45:35

问题


I receive data files from a source I have no control over (the government) and in the records they have a Company Name field that I actually need to associate with existing company records in my database. I'm concerned that some of the names will vary by minor differences such as 'Company X, Inc.' vs 'Company X Inc'.

So my initial thoughts would be to create a collation key field based on the name ToLower() and apply a regex to strip out all white space, and special characters.

Is there any better methodology to apply to this?


回答1:


that may work, but there may be false matches, with no way to prevent them, because you have an algorithm solution only. Your best bet is to create an alias table. Include every variation ever found for each company name and a FK to the real company's ID. Include a row for the actual name as well.

AliasID CompanyID CompanyAlias
------- --------- ------------
1       1         Company X, Inc   <<--actual real company name
2       1         Company X Inc
3       1         Company X

If an exact name match is not found in this table when importing data, you can use your proposed algorithm or another, or use a human input, etc to find a match or generate a new company. At that point insert into the alias table. If you find that your match was wrong for some reason, your can alter the alias table to make the proper mapping. If you only go with an algorithm, you'd need to include exceptions and your algorithm would grow large and slow. With this table and a good index, finding your matches should be fast.



来源:https://stackoverflow.com/questions/2342717/sql-server-what-to-do-to-make-collation-key-from-a-string-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!