I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to ch
There is a very convenient and simple library for testing PDF content: https://github.com/codeborne/pdf-test
API is very simple:
import com.codeborne.pdftest.PDF;
import static com.codeborne.pdftest.PDF.*;
import static org.junit.Assert.assertThat;
public class PDFContainsTextTest {
@Test
public void canAssertThatPdfContainsText() {
PDF pdf = new PDF(new File("src/test/resources/50quickideas.pdf"));
assertThat(pdf, containsText("50 Quick Ideas to Improve your User Stories"));
}
}