问题
I have image.want to extract image data with same layout into docx file and in readable form using python.i have tried Applied tessaract on image and converting to pdf using pyteesaract Then converting pdf to word file But i am not able to maintain the layout and format.
回答1:
This question has been answered before in here. You can use the pdf2image library for this issue:
from pdf2image import convert_from_path
pages = convert_from_path('sample.pdf', 400) //400 is the Image quality in DPI (default 200)
pages[0].save("sample.png")
来源:https://stackoverflow.com/questions/59309580/extract-image-data-based-on-coordinates-or-tessaract-and-writing-the-content-in