问题
I am trying to find the Levenshtien distance between the columns of two different tables TableA and TableB. Basically I need to match ColumnA of TableA with all the elements of ColumnB in TableB and find the Levenshtien Distance
I have created a Levenshtien Function as follows
CREATE FUNCTION [Levenshtein]
(@value1 [NVARCHAR](MAX),
@value2 [NVARCHAR](MAX))
RETURNS [INT]
AS
EXTERNAL NAME [FastenshteinAssembly].[Fastenshtein.Levenshtein].[Distance]
GO
This is basically calling a Levenshtien dll I have on my machine. I tried creating a stored procedure for this operation but I am unsure whether that's an optimized approach or not.
Table B contains millions of CompanyNames and TableA also consists of thousand of CompanyNames so this operation would be essentially a (n*m) operation.
Whats the optimized approach of achieving this.
Thanks
回答1:
There is no optimized approach for doing this.
There may be some hacks that you can do to simplify the processing. For instance, you could create lookup tables on each side using n-grams and only compare names whose ngrams are close. Or, you could use soundex()
for the same purpose -- or the first three characters.
However, if you need to match to all possibilities, then this is an expensive n*m operation in SQL Server.
来源:https://stackoverflow.com/questions/54959790/how-to-call-levenshtien-function-using-the-values-from-two-different-tables-in-t