Extract PDF text by coordinates

前端 未结 6 823
半阙折子戏
半阙折子戏 2021-02-04 20:03

I\'d like to know if there\'s some PDF library in Microsoft .NET being able of extracting text by giving coordinates.

For example (in pseudo-code):

<         


        
6条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-04 20:43

    Well, thank you for your effort anyone.

    I got it using Apache's PDFBox on top of IKVM compilation, and this is the final code:

    PDDocument doc = PDDocument.load(@"c:\invoice.pdf");
    
    PDFTextStripperByArea stripper = new PDFTextStripperByArea();
    stripper.addRegion("testRegion", new java.awt.Rectangle(0, 10, 100, 100));
    stripper.extractRegions((PDPage)doc.getDocumentCatalog().getAllPages().get(0));
    
    string text = stripper.getTextForRegion("testRegion");
    

    And it works like a charm.

    Thank you anyway and I hope my own answer will help others. If you need further details, just comment out here and I'll update this answer.

提交回复
热议问题