how can we extract text from pdf using itextsharp with spaces?

前端 未结 3 1546
南方客
南方客 2020-12-10 09:17

I am using below method to extract pdf text line by line. But problem that, it is not reading spaces between words and figures. what could be the solution for this ??

<
3条回答
  •  南方客
    南方客 (楼主)
    2020-12-10 09:52

    using (PdfReader reader = new PdfReader(path))
                {
                    StringBuilder text = new StringBuilder();
                    StringBuilder textfinal = new StringBuilder();
                    String page = "";
                    for (int i = 1; i <= reader.NumberOfPages; i++)
                    {
                        text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
                        page = PdfTextExtractor.GetTextFromPage(reader, i);
                        string[] lines = page.Split('\n');
                        foreach (string line in lines)
                        {
                            string[] words = line.Split('\n');
                            foreach (string wrd in words)
                            {
    
                            }
                            textfinal.Append(line);
                            textfinal.Append(Environment.NewLine); 
                        }
                        page = "";
                    }
               }
    

提交回复
热议问题