发表新帖

发表新帖

Extract PDF text by coordinates

前端未结

关注

 6  851

半阙折子戏 2021-02-04 20:03

I\'d like to know if there\'s some PDF library in Microsoft .NET being able of extracting text by giving coordinates.

For example (in pseudo-code):

6条回答

刺人心 (楼主)

2021-02-04 20:43
Well, thank you for your effort anyone.

I got it using Apache's PDFBox on top of IKVM compilation, and this is the final code:
```
PDDocument doc = PDDocument.load(@"c:\invoice.pdf");

PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.addRegion("testRegion", new java.awt.Rectangle(0, 10, 100, 100));
stripper.extractRegions((PDPage)doc.getDocumentCatalog().getAllPages().get(0));

string text = stripper.getTextForRegion("testRegion");
```
And it works like a charm.

Thank you anyway and I hope my own answer will help others. If you need further details, just comment out here and I'll update this answer.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题