Here\'s the problem -- I have a few thousand small text snippets, anywhere from a few words to a few sentences - the largest snippet is about 2k on disk. I want to be able to c
These articles on semantic relatedness and semantic similarity may be helpful. And this SO question about Latent Semantic Analysis.
You could also look into Soundex for words that "sound alike" phonetically.