Determine if a byte[] is a pdf file

前端 未结 5 2073

Is there any way of checking if a byte[] is a pdf without opening?

I have some code to display a list of byte[] as pdf thumbnails. I previously knew all the byte[] w

相关标签:
5条回答
  • 2020-12-05 03:08

    As far as I know all PDF's start with %PDF, so you could check the first bytes against this string.

    0 讨论(0)
  • 2020-12-05 03:15

    First four bytes should be: 0x25 0x50 0x44 0x46 (in hex format, in ASCII it's %PDF). "Magic numbers" for another formats you can find here

    0 讨论(0)
  • 2020-12-05 03:24

    Check the first 4 bytes of the array.

    If those are 0x25 0x50 0x44 0x46 then it's most probably a PDF file.

    0 讨论(0)
  • 2020-12-05 03:25

    I've been having this problem. We use some Magic library from GitHub that determines content as PDF very well. However, we've been receiving some files that

    1. do open in PDF readers
    2. do have different start bytes (5) before %PDF-
    3. Do end with these 8 bytes 0A 0D 0A 30 0D 0A 0D 0A

    So, I've added logic to check for these starting bytes 5-9, and 8 bytes in the end, when a file with PDF extension is not matched otherwise.

    0 讨论(0)
  • 2020-12-05 03:30

    While the marked answer and the other answers are correct, they will not be successful 100% of the time. The problem is the PDF spec says the %PDF-1.x only needs to be in the first 1024 bytes and not the first 4. Some programs will add information before %PDF and still be valid.

    I would recommend seeing the answer for the following Stack Overflow question: How to detect if a file is PDF or TIFF?

    0 讨论(0)
提交回复
热议问题