How can I determine if a file is a PDF file?

后端未结

关注

 13  962

I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to ch

相关标签:

13条回答

后悔当初

2020-12-24 12:34

In general, we can like this, any pdf version going to finish with %%EOF so we can check like bellow.

public static boolean is_pdf(byte[] data) {
        String s = new String(data);
        String d = s.substring(data.length - 7, data.length - 1);
        if (data != null && data.length > 4 &&
                data[0] == 0x25 && // %
                data[1] == 0x50 && // P
                data[2] == 0x44 && // D
                data[3] == 0x46 && // F
                data[4] == 0x2D) { // -

              if(d.contains("%%EOF")){
                 return true; 
              }         
        }
        return false;
    }

0 讨论(0)

上一页 1 2 3