How can I determine if a file is a PDF file?

后端 未结 13 896
暖寄归人
暖寄归人 2020-12-24 11:57

I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to ch

相关标签:
13条回答
  • 2020-12-24 12:34

    In general, we can like this, any pdf version going to finish with %%EOF so we can check like bellow.

    public static boolean is_pdf(byte[] data) {
            String s = new String(data);
            String d = s.substring(data.length - 7, data.length - 1);
            if (data != null && data.length > 4 &&
                    data[0] == 0x25 && // %
                    data[1] == 0x50 && // P
                    data[2] == 0x44 && // D
                    data[3] == 0x46 && // F
                    data[4] == 0x2D) { // -
    
                  if(d.contains("%%EOF")){
                     return true; 
                  }         
            }
            return false;
        }
    
    0 讨论(0)
提交回复
热议问题