How to extract data from a PDF file while keeping track of its structure?

后端 未结 6 544
醉话见心
醉话见心 2020-12-12 22:52

My objective is to extract the text and images from a PDF file while parsing its structure. The scope for parsing the structure is not exhaustive; I only need to be able to

6条回答
  •  抹茶落季
    2020-12-12 23:14

    PDF files can be parsed with tabula-py, or tabula-java.

    I made a full tutorial on how to use tabula-py on this article. You can tabula in a web-browser too as long as you have installed Java.

提交回复
热议问题