How to extract data from a PDF file while keeping track of its structure?

后端 未结 6 542
醉话见心
醉话见心 2020-12-12 22:52

My objective is to extract the text and images from a PDF file while parsing its structure. The scope for parsing the structure is not exhaustive; I only need to be able to

6条回答
  •  -上瘾入骨i
    2020-12-12 23:17

    Unless its is Marked Content, PDF does not have a structure.... You have to 'guess' it which is what the various tools are doing. There is a good blog post explaining the issues at http://blog.idrsolutions.com/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/

提交回复
热议问题