How to extract fonts from PDDocument in PDFBox 2.0.2

寵の児 提交于 2019-12-13 02:55:11

问题


I have seen how to do this in previous versions like below:

How to extract font styles of text contents using pdfbox?

But I think the getFonts() method has been removed now. I want to retrieve a map of texts to fonts (Map<String, PDFont>) in the new version of PDFBox but I have no idea how.

Thanks

Kabeer


回答1:


Do this:

PDDocument doc = PDDocument.load("C:/mydoc3.pdf");
for (int i = 0; i < doc.getNumberOfPages(); ++i)
{
    PDPage page = doc.getPage(i);
    PDResources res = page.getResources();
    for (COSName fontName : res.getFontNames())
    {
        PDFont font = res.getFont(fontName);
        // do stuff with the font
    }
}



回答2:


For PDFBox 2.x the revised code for the answer you are linking to is

PDDocument  doc = PDDocument.load("C:/mydoc3.pdf");
for(PDPage page : doc.getPages()){
    // get the names of the fonts in the resources dictionary
    Iterable<COSName> iterable = page.getResources().getFontNames();
    // to get the font for each item call
    // page.getResources().getFont(COSName name);
}



回答3:


This one is to Extract font of the Pdf file using pdfbox 2.0.6.

import java.io.File;
import java.util.List;
import java.util.Map;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.font.PDFont;
public class PDFFontExtractor {
    public static void main(String args[])
    {
        try
        {  
            PDDocument pddDocument = PDDocument.load(new File("C:\\Users\\Desktop\\sample1.pdf"));
            for (int i = 0; i < pddDocument.getNumberOfPages(); ++i)
            {
                PDPage page = pddDocument.getPage(i);
                PDResources res = page.getResources();
                for (COSName fontName : res.getFontNames())
                {
                    PDFont font = res.getFont(fontName);
                    System.out.println("FONT :: "+ font);
                }
            } 
        }
        catch(Exception ex)
        {
            ex.printStackTrace();
        }
    }
}


来源:https://stackoverflow.com/questions/38369096/how-to-extract-fonts-from-pddocument-in-pdfbox-2-0-2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!