I am working with some really large databases of newspaper articles, I have them in a MySQL database, and I can query them all.
I am now searching for ways to help m
You should use a metric such as tf-idf to get the tags out:
Various implementations of tf-idf are available; for Java and .NET, there's Lucene, for Python there's scikits.learn.
If you want to do better than this, use language models. That requires some knowledge of probability theory.