Regular expression to strip everything but words
问题 I'm helpless on regular expressions so please help me on this problem. Basically I am downloading web pages and rss feeds and want to strip everything except plain words. No periods, commas, if, ands, and buts. Literally I have a list of the most common words used in English and I also want to strip those too but I think I know how to do that and don't need a regular expression because it would be really way to long. How do I strip everything from a chunk of text except words that are