How to get text with a certain color from a pdf c#

夙愿已清 提交于 2019-12-05 20:17:23

By using this library http://www.codeproject.com/KB/files/xpdf_csharp.aspx?msg=3154408 you have an access to every word style (font, color...)

this.pdfDoc.Pages[4].WordList.ElementAt(143).ForeColor

iText's PdfTextExtractor (and all the code it rests on) DOES NOT track the current color. Ouch. It wouldn't be all that hard to add, so you could modify iText yourself:

  1. Add stroke and fill color members to the GraphicState class (and update the various constructors appropriately).
  2. You'd need to add ContentOperator classes for 'g', 'G', 'rg', 'RG', 'K', and 'k' (and maybe CS, cs, SC, sc, SCN, scn), to modify the stroke and fill colors.
  3. Add methods to TextRenderInfo to get the current stroke and fill colors.

Try PdfLibTET http://www.pdflib.com/products/tet/
It should be able to get informations about text.

I've taken a different approach. I converted the pdf to an excel file. And this was very easy to search for the coloured text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!