ItextSharp - diacritic chars

假装没事ソ 提交于 2019-12-13 07:55:26

问题


i reading pdf documents via ItextSharp library. But these documents is in Czech language which use diacritic (ř ě ž š č etc.) How I can read this chars? Any idea? Or, is some solution for replacing this chars for normal r e z s c ? This is code in my method. Thanks

 PdfReader reader = new PdfReader("M:/ShareDirs_KSP/RDM_Debtors/DMS_PROD/" + src);

    // we can inspect the syntax of the imported page
    String text = new String();
    for (int page = 1; page <= 1; page++) {

        text += PdfTextExtractor.getTextFromPage(reader, page);
    }

    reader.close();

回答1:


I have written a small proof of concept that parses the file czech.pdf. This file contains several characters with diacritics. It was created in answer to the following question: Can't get Czech characters while generating a PDF

The text is stored in the file twice: once using a simple font, once using a composite font. In my proof of concept (named ParseCzech), I parse this PDF to a file encoded using UTF-8 (UNICODE):

public void parse(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    FileOutputStream fos = new FileOutputStream(DEST);
    for (int page = 1; page <= 1; page++) {
        fos.write(PdfTextExtractor.getTextFromPage(reader, page).getBytes("UTF-8"));
    }
    fos.flush();
    fos.close();
}

The result is the file czech.txt:

As you can see from the screen shot, the text is extracted correctly (but make sure that the viewer you use knows that the file is encoded as UTF-8, otherwise you may see strange characters instead of the actual text).

Note that some PDFs do not allow text to be extracted correctly. This is explained in the following video: http://www.youtube.com/watch?v=wxGEEv7ibHE

Please share your PDF so that people on StackOverflow can check whether you don't succeed to extract text because of an error in your code, or whether you don't succeed because the PDF doesn't allow you to extract the text.



来源:https://stackoverflow.com/questions/26670919/itextsharp-diacritic-chars

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!