iTextSharp - Reading PDF with 2 columns

試著忘記壹切 提交于 2019-12-13 05:03:46

问题


I'm having trouble reading a PDF with header and footer but with 2 columns in your body.

I already have the column widths and height of the header but I need the code to read the pages with columns.

Can anyone provide me a piece of code that reads PDF with columns?

thank you


回答1:


It's very hard to achieve what you want if you don't know the position of the columns, but I assume that you have its coordinates because you say "I already have the column widths and height". In that case, your question isn't that different from this other question posted on StackOverflow: iTextSharp read from specific position

Suppose that rect is a Rectangle corresponding with the position of a column, then you need this code:

RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
    new LocationTextExtractionStrategy(), filter);
String single_column = PdfTextExtractor.GetTextFromPage(reader, i, strategy));

Now you have the text in a single column. You need to repeat this for every column on your page.

Extra comment: While in most cases using the RegionTextRenderFilter will work just fine, a few cases (in which columns are created by simply inserting additional space characters in the lines) might require to split the text chunks to process in advance. This can be done e.g. by using the TextRenderInfoSplitter from this answer and wrapping the FilteredTextRenderListener in it. (This comment was provided by mkl.)



来源:https://stackoverflow.com/questions/24234512/itextsharp-reading-pdf-with-2-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!