Read PDF in selenium: The constructor PDFParser(BufferedInputStream) is undefined

China☆狼群 提交于 2019-12-11 06:41:56

问题


I am getting error

The constructor PDFParser(BufferedInputStream) is undefined

I am trying to read PDF contents using Selenium.

WebDriver driver=new FirefoxDriver();
driver.get("http://www.axmag.com/download/pdfurl-guide.pdf");
URL TestURL = new URL("http://www.axmag.com/download/pdfurl-guide.pdf");
BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
PDFParser TestPDF = new PDFParser(TestFile);
TestPDF.parse();
String TestText = new PDFTextStripper().getText(TestPDF.getPDDocument());
System.out.println(TestText);
Assert.assertTrue(TestText.contains("Open the setting.xml, you can see it is like this"));

Can anyone please help?


回答1:


I got the same Propblem you have faced. The problem is due to using (Apache PDFBox 2.0.0 API) jar Files. Remove them from build path and use (Apache PDFBox 1.8.11 API) as PDFParser class in 2.0 doesn't have PDFParser(BufferedInputStream args) Constructor. But 1.8 has PDFParser(InputStream args) Constructor. So it will Definately Solve your Problem.

I will also share my code. if you need Help you can take from that.

InputStream is = new FileInputStream(getLatestFile);
        PDFParser parser = new PDFParser(is);
        parser.parse();
        String output=new PDFTextStripper().getText(parser.getPDDocument());
        System.out.println(output);
        parser.getPDDocument().close(); 



回答2:


The best code for PDFBox 2.0.2 (also works in 1.8.*) would be this - you only need to call PDDocument.load() to open a PDF file:

WebDriver driver = new FirefoxDriver();
driver.get("http://www.axmag.com/download/pdfurl-guide.pdf");
URL url = new URL("http://www.axmag.com/download/pdfurl-guide.pdf");
BufferedInputStream bis = new BufferedInputStream(url.openStream());
PDDocument doc = PDDocument.load(bis);
String text = new PDFTextStripper().getText(doc);
doc.close();
bis.close();
System.out.println(text);
Assert.assertTrue(text.contains("Open the setting.xml, you can see it is like this"));


来源:https://stackoverflow.com/questions/39233547/read-pdf-in-selenium-the-constructor-pdfparserbufferedinputstream-is-undefine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!