I am trying to extract text with all information from the pdf using pdfbox. I got all the information i want, except color. I tried different ways to get the fontcolor (incl
I found some code in one of my maintenance program.
I do not know it works for you or not, please try It.
Also check out this link http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/common/class-use/PDStream.html
It may help you
PDDocument doc = null;
try {
doc = PDDocument.load("C:/Path/To/Pdf/Sample.pdf");
PDFStreamEngine engine = new PDFStreamEngine(ResourceLoader.loadProperties("org/apache/pdfbox/resources/PageDrawer.properties"));
PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
engine.processStream(page, page.findResources(), page.getContents().getStream());
PDGraphicsState graphicState = engine.getGraphicsState();
System.out.println(graphicState.getStrokingColor().getColorSpace().getName());
float colorSpaceValues[] = graphicState.getStrokingColor().getColorSpaceValue();
for (float c : colorSpaceValues) {
System.out.println(c * 255);
}
}
finally {
if (doc != null) {
doc.close();
}