Say I have a list of movie names with misspellings and small variations like this -
\"Pirates of the Caribbean: The Curse of the Black Pearl\"
\"Pirates o
I believe there is in fact two distinct problems.
The first is spell correction. You can have one in Python here
http://norvig.com/spell-correct.html
The second is more functional. Here is what I'd do after the spell correction. I would make a relation function.
related( sentence1, sentence2 ) if and only if sentence1 and sentence2 have rare common words. By rare, I mean words different than (The, what, is, etc...). You can take a look at the TF/IDF system to determine if two document are related using their words. Just googling a bit I found this:
https://code.google.com/p/tfidf/