How to extract literal words from a consecutive string efficiently? [duplicate]
This question already has answers here : Closed 7 years ago . Possible Duplicate: How to split text without spaces into list of words? There are masses of text information in people's comments which are parsed from html, but there are no delimiting characters in them. For example: thumbgreenappleactiveassignmentweeklymetaphor . Apparently, there are 'thumb', 'green', 'apple', etc. in the string. I also have a large dictionary to query whether the word is reasonable. So, what's the fastest way to extract these words? I'm not really sure a naive algorithm would serve your purpose well, as