I would recommend Aspose Total for this. A few years ago I did a project on doing pretty much exactly what you are asking and compared to using the Office Interop stuff between different versions of Office (Prior to the change to XML) Aspose was the most robust library. You will probably have to do some OCR based on what you are talking about too. It's not cheap but I found their API's pretty solid and it works on most versions of the file types you are asking about. You should be able to use the free trial to see if it will fit for you project. I have no affiliation with Aspose other than that I used their tools in a production environment.
Aspose Total