I\'m trying to come up with a method of finding duplicate addresses, based on a similarity score. Consider these duplicate addresses:
addr_1 = \'# 3 FAIRMONT LIN
In order to do this right, you need to standardize your addresses according to USPS standards (your address examples appear to be US based). There are many direct marketing service providers that offer CASS (Coding Accuracy Support System) certification of postal addresses. The CASS process will standardize all of your addresses and append zip + 4 to them. Any undeliverable addresses will be flagged which will further reduce your postal mailing costs, if that is your intent. Once all of your addresses are standardized, eliminating duplicates will be trivial.