My objective is to extract the text and images from a PDF file while parsing its structure. The scope for parsing the structure is not exhaustive; I only need to be able to
Unless its is Marked Content, PDF does not have a structure.... You have to 'guess' it which is what the various tools are doing. There is a good blog post explaining the issues at http://blog.idrsolutions.com/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/