How to find x,y location of a text in pdf

一世执手 提交于 2019-12-03 15:46:35

Docotic.Pdf Library can do it. See C# sample below:

using (PdfDocument doc = new PdfDocument("your_pdf.pdf", "password_if_need"))
{
    foreach (PdfTextData textData in doc.Pages[0].Canvas.GetTextData())
        Console.WriteLine(textData.Position + " " + textData.Text);
}

Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object.

If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.

TET, the Text Extraction Toolkit from the pdflib family of products can do that. TET has a commandline interface, and it's the most powerful of all text extraction tools I'm aware of. (It can even handle ligatures...)

Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, and text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!