Here\'s a puzzle...
I have two databases of the same 50000+ electronic products and I want to match products in one database to those in the other. However, the prod
Use a large set of training examples. For each possible pair in this example set:
Now, when you get a pair of strings for which you want to decide if they are same or not, extract the features like you did in the training set and create the tuple of numbers for the distance between the various components of the string. Feed the tuple to the trained SVM and classify if they are same or not.
The advantage of using a learning approach like this is that you don't have to keep modifying the rules over and over again, and also the system learns the differences between a large pair of products that are same and different.
You could use LibSVM package in WEKA to do this.