I have got a simple contacts database but I\'m having problems with users entering in duplicate data. I have implemented a simple data comparison but unfortunately the duplicate
I imagine that this problem is well understood but what occurs to me on first reading is:
Use your existing database to get a good first guess for the threshold, and correct as you accumulate experience.
You may prefer a fairly strong bias toward false positives, at least at first.