Problem with PdfTextExtractor in itext!

你离开我真会死。 提交于 2019-12-08 00:25:48

问题


first excuse me for my bad english! I want to search in pdf document for a word like "Hello" . So I must read each page in pdf by PdfTextExtractor. I did it well. I can read all words in each page separately an save it in string buffer. but when i push this code in For loop ,(for example from page 1 to 7 for search in it) earlier page's words will remain in string buffer.I hop you understand my problem. Tanx all. this is my code :

        PdfReader reader2 = new PdfReader(openFileDialog1.FileName);
        int pagen = reader2.NumberOfPages;
        reader2.Close();
        ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
        for (int i = 1; i < pagen; i++)
        {
            textBox1.Text = "";
            PdfReader reader = new PdfReader(openFileDialog1.FileName);

            String  s = PdfTextExtractor.GetTextFromPage(reader, i, its);
            //MessageBox.Show(s.Length.ToString());
            //PdfTextArray h = new PdfTextArray(s);

            //
            // s = "";
            s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
            textBox1.Text = s;
            reader.Close();

}


回答1:


SimpleTextExtractionStrategy doesn't let you reset it unfortunately, so you must move your "new SimpleTextExtractionStrategy()" inside the loop instead of reusing the same object.




回答2:


There is another potential problem in the statement which controls your loop:

for (int i = 1; i < pagen; i++)

If pagen = 1, the loop is not executed at all. It should read:

for (int i = 1; i <= pagen; i++)



回答3:


public string ReadPdfFile(object Filename,DataTable ReadLibray)
    {
     PdfReader reader2 = new PdfReader((string)Filename);
     string strText = string.Empty;

     for (int page = 1; page <= reader2.NumberOfPages; page++)
     {
         ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();       
         PdfReader reader = new PdfReader((string)Filename);  
         String  s = PdfTextExtractor.GetTextFromPage(reader, page, its);

         s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
         strText = strText + s;
         reader.Close(); 
      }
      return strText;
    }

This Code is very HelpFull to read PDf using itext



来源:https://stackoverflow.com/questions/3704518/problem-with-pdftextextractor-in-itext

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!