How can I determine if a file is a PDF file?

后端未结

关注

 13  965

暖寄归人 2020-12-24 11:57

I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to ch

13条回答

臣服心动 (楼主)

2020-12-24 12:27

There is a very convenient and simple library for testing PDF content: https://github.com/codeborne/pdf-test

API is very simple:

import com.codeborne.pdftest.PDF;
import static com.codeborne.pdftest.PDF.*;
import static org.junit.Assert.assertThat;

public class PDFContainsTextTest {
  @Test
  public void canAssertThatPdfContainsText() {
    PDF pdf = new PDF(new File("src/test/resources/50quickideas.pdf"));
    assertThat(pdf, containsText("50 Quick Ideas to Improve your User Stories"));
  }
}

0 讨论(0)

查看其它13个回答