Tabula extract tables by area coordinates

醉酒当歌 提交于 2019-12-03 16:13:30

Tabula needs areas to be specified in PDF units, which are defined to be 1/72 of an inch. If using Acrobat Reader DC, you can use the Measure tool and multiply its readings by 72.

Tabula needs the area to be specified as the top, left, bottom and right distances. To obtain them, you can measure the distances from the top of the page to the beginning of the table and so on.

I had the same problem, the code seemed to ignore the area callout. Fixed it by including "guess = False" in the command line. like so (note I'm using revision 1.2.1):

df = tabula.read_pdf(file_folder + file_name, 
                     guess=False, pages=1, stream=True , encoding="utf-8", 
                     area = (200.8125,64.6425,352.2825,496.1025), 
                     columns = (65.3,196.86,294.96,351.81,388.21,429.77))

Tabula can understand coordinates data in the form of "points".

In windows you can measure your areas coordinates with Adobe Acrobat DC and Acrobat Reader DC

if you have Adobe Acrobat DC - Tools >> Edit PDF >> Select Your Area and Press Enter >> Change Units to Points

Top               100       pt = A
Left              50        pt = B
Cropped page size 370 x 225 pt = C x D

if you have Adobe Acrobat DC or Acrobat Reader DC- Edit >> Preferences >> Units >> Change Page Units to Points >> OK >> Tools >> Measure

Top           = A = 100
Left          = B = 50
Areas  Width  = C = 370
Areas  Length = D = 225

you have to do this calculation

area=[A,B,A+D,B+C]
area=[100,50,100+225,50+370]

in code

df=read_pdf(folder,area=[[100,50,325,420]] ,output_format="xlsx")

Reader only allows measurements if the PDF creator had allowed it. Found this instead: https://graphicdesign.stackexchange.com/a/81666

Brief steps:

  1. Download SumatraPDF. It is also available as zip, no install needed.
  2. Open PDF with the Sumatra reader.
  3. Press 'm' - this shows cursor position in top left corner.
  4. Use tabula with options -p for page, -a for area. (top,left,bottom,right)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!