Your approach seems sensible and there are two ways you can improve the tagging.
Use a known list of keywords/phrases for your tagging and if the count of the instances of this word/phrase is greater than a threshold (probably based on the length of the article) then include the tag.
Use a part of speech tagging algorithm to help reduce the article into a sensible set of phrases and use a sensible method to extract tags out of this. Once you have the articles reduced using such an algorithm, you would be able to identify some good candidate words/phrases to use in your keyword/phrase list for method 1.