Given these 3 lists of data and a list of keywords:
good_data1 = [\'hello, world\', \'hey, world\']
good_data2 = [\'hey, man\', \'whats up\']
bad_data = [\'h
If you have many keywords, you might want to try a suffix tree [1]. Insert all the words from the three data lists, storing which list each word comes from in it's terminating node. Then you can perform queries on the tree for each keyword really, really fast.
Warning: suffix trees are very complicated to implement!
[1] http://en.wikipedia.org/wiki/Suffix_tree