How can I determine if a file is a PDF file?

后端 未结 13 931
暖寄归人
暖寄归人 2020-12-24 11:57

I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to ch

13条回答
  •  再見小時候
    2020-12-24 12:27

    Here is what I use into my NUnit tests, that must validate against multiple versions of PDF generated using Crystal Reports:

    public static void CheckIsPDF(byte[] data)
        {
            Assert.IsNotNull(data);
            Assert.Greater(data.Length,4);
    
            // header 
            Assert.AreEqual(data[0],0x25); // %
            Assert.AreEqual(data[1],0x50); // P
            Assert.AreEqual(data[2],0x44); // D
            Assert.AreEqual(data[3],0x46); // F
            Assert.AreEqual(data[4],0x2D); // -
    
            if(data[5]==0x31 && data[6]==0x2E && data[7]==0x33) // version is 1.3 ?
            {                  
                // file terminator
                Assert.AreEqual(data[data.Length-7],0x25); // %
                Assert.AreEqual(data[data.Length-6],0x25); // %
                Assert.AreEqual(data[data.Length-5],0x45); // E
                Assert.AreEqual(data[data.Length-4],0x4F); // O
                Assert.AreEqual(data[data.Length-3],0x46); // F
                Assert.AreEqual(data[data.Length-2],0x20); // SPACE
                Assert.AreEqual(data[data.Length-1],0x0A); // EOL
                return;
            }
    
            if(data[5]==0x31 && data[6]==0x2E && data[7]==0x34) // version is 1.4 ?
            {
                // file terminator
                Assert.AreEqual(data[data.Length-6],0x25); // %
                Assert.AreEqual(data[data.Length-5],0x25); // %
                Assert.AreEqual(data[data.Length-4],0x45); // E
                Assert.AreEqual(data[data.Length-3],0x4F); // O
                Assert.AreEqual(data[data.Length-2],0x46); // F
                Assert.AreEqual(data[data.Length-1],0x0A); // EOL
                return;
            }
    
            Assert.Fail("Unsupported file format");
        }
    

提交回复
热议问题