I'd recommend Goose: https://github.com/jiminoc/goose
It's not as general-use as you might need but if you are scraping article content from popular sites, it may work out of the box. It also provides a framework for you to work from if you want to extend their code to cover other sites.