How can I determine if a file is a PDF file?

后端 未结 13 965
暖寄归人
暖寄归人 2020-12-24 11:57

I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to ch

13条回答
  •  臣服心动
    2020-12-24 12:27

    There is a very convenient and simple library for testing PDF content: https://github.com/codeborne/pdf-test

    API is very simple:

    import com.codeborne.pdftest.PDF;
    import static com.codeborne.pdftest.PDF.*;
    import static org.junit.Assert.assertThat;
    
    public class PDFContainsTextTest {
      @Test
      public void canAssertThatPdfContainsText() {
        PDF pdf = new PDF(new File("src/test/resources/50quickideas.pdf"));
        assertThat(pdf, containsText("50 Quick Ideas to Improve your User Stories"));
      }
    }
    

提交回复
热议问题