Extract image data based on coordinates or tessaract and writing the content in docs/docx word file

房东的猫 提交于 2020-01-25 07:28:07

问题


I have image.want to extract image data with same layout into docx file and in readable form using python.i have tried Applied tessaract on image and converting to pdf using pyteesaract Then converting pdf to word file But i am not able to maintain the layout and format.


回答1:


This question has been answered before in here. You can use the pdf2image library for this issue:

from pdf2image import convert_from_path

pages = convert_from_path('sample.pdf', 400) //400 is the Image quality in DPI (default 200)

pages[0].save("sample.png")


来源:https://stackoverflow.com/questions/59309580/extract-image-data-based-on-coordinates-or-tessaract-and-writing-the-content-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!