PDFBox not supporting multiple languages

问题

I'm trying to generate a PDF report consisting sentences in multiple languages. For that I'm using Google NOTO fonts but google CJK fonts doesn't support some of the Latin special characters due to that my PDF box is failing to generate a report or sometime shows weird characters.

Any one has any appropriate solution ? I tried multiple things but unable to find a single TTF file that can support all Unicode. I also tried fallback to different font file but that will be too much work.

Languages I support:- Japanese, German, Spanish, Portuguese, English.

Note:- I don't want to use arialuni.ttf file due to Licensing issue.

Can anyone suggest anything.

回答1:

Here is the code that will be in release 2.0.14 in the examples subproject:

/**
 * Output a text without knowing which font is the right one. One use case is a worldwide
 * address list. Only LTR languages are supported, RTL (e.g. Hebrew, Arabic) are not 
 * supported so they would appear in the wrong direction.
 * Complex scripts (Thai, Arabic, some Indian languages) are also not supported, any output
 * will look weird. There is an (unfinished) effort here:
 * https://issues.apache.org/jira/browse/PDFBOX-4189
 * 
 * @author Tilman Hausherr
 */
public class EmbeddedMultipleFonts
{
    public static void main(String[] args) throws IOException
    {
        try (PDDocument document = new PDDocument())
        {
            PDPage page = new PDPage(PDRectangle.A4);
            document.addPage(page);

            PDFont font1 = PDType1Font.HELVETICA; // always have a simple font as first one
            TrueTypeCollection ttc2 = new TrueTypeCollection(new File("c:/windows/fonts/batang.ttc"));
            PDType0Font font2 = PDType0Font.load(document, ttc2.getFontByName("Batang"), true); // Korean
            TrueTypeCollection ttc3 = new TrueTypeCollection(new File("c:/windows/fonts/mingliu.ttc"));
            PDType0Font font3 = PDType0Font.load(document, ttc3.getFontByName("MingLiU"), true); // Chinese
            PDType0Font font4 = PDType0Font.load(document, new File("c:/windows/fonts/mangal.ttf")); // Indian
            PDType0Font font5 = PDType0Font.load(document, new File("c:/windows/fonts/ArialUni.ttf")); // Fallback

            try (PDPageContentStream cs = new PDPageContentStream(document, page))
            {
                cs.beginText();
                List<PDFont> fonts = new ArrayList<>();
                fonts.add(font1);
                fonts.add(font2);
                fonts.add(font3);
                fonts.add(font4);
                fonts.add(font5);
                cs.newLineAtOffset(20, 700);
                showTextMultiple(cs, "abc 한국 中国 भारत 日本 abc", fonts, 20);
                cs.endText();
            }

            document.save("example.pdf");
        }
    }

    static void showTextMultiple(PDPageContentStream cs, String text, List<PDFont> fonts, float size)
            throws IOException
    {
        try
        {
            // first try all at once
            fonts.get(0).encode(text);
            cs.setFont(fonts.get(0), size);
            cs.showText(text);
            return;
        }
        catch (IllegalArgumentException ex)
        {
            // do nothing
        }
        // now try separately
        int i = 0;
        while (i < text.length())
        {
            boolean found = false;
            for (PDFont font : fonts)
            {
                try
                {
                    String s = text.substring(i, i + 1);
                    font.encode(s);
                    // it works! Try more with this font
                    int j = i + 1;
                    for (; j < text.length(); ++j)
                    {
                        String s2 = text.substring(j, j + 1);

                        if (isWinAnsiEncoding(s2.codePointAt(0)) && font != fonts.get(0))
                        {
                            // Without this segment, the example would have a flaw:
                            // This code tries to keep the current font, so
                            // the second "abc" would appear in a different font
                            // than the first one, which would be weird.
                            // This segment assumes that the first font has WinAnsiEncoding.
                            // (all static PDType1Font Times / Helvetica / Courier fonts)
                            break;
                        }
                        try
                        {
                            font.encode(s2);
                        }
                        catch (IllegalArgumentException ex)
                        {
                            // it's over
                            break;
                        }
                    }
                    s = text.substring(i, j);
                    cs.setFont(font, size);
                    cs.showText(s);
                    i = j;
                    found = true;
                    break;
                }
                catch (IllegalArgumentException ex)
                {
                    // didn't work, will try next font
                }
            }
            if (!found)
            {
                throw new IllegalArgumentException("Could not show '" + text.substring(i, i + 1) +
                        "' with the fonts provided");
            }
        }
    }

    static boolean isWinAnsiEncoding(int unicode)
    {
        String name = GlyphList.getAdobeGlyphList().codePointToName(unicode);
        if (".notdef".equals(name))
        {
            return false;
        }
        return WinAnsiEncoding.INSTANCE.contains(name);
    }
}

Alternatives to arialuni can be found here: https://en.wikipedia.org/wiki/Open-source_Unicode_typefaces

来源：https://stackoverflow.com/questions/54248117/pdfbox-not-supporting-multiple-languages

标签

java

pdfbox

liferay-7