Is there any way of checking if a byte[] is a pdf without opening?
I have some code to display a list of byte[] as pdf thumbnails. I previously knew all the byte[] w
As far as I know all PDF's start with %PDF
, so you could check the first bytes against this string.
First four bytes should be: 0x25 0x50 0x44 0x46
(in hex format, in ASCII it's %PDF
). "Magic numbers" for another formats you can find here
Check the first 4 bytes of the array.
If those are 0x25 0x50 0x44 0x46
then it's most probably a PDF file.
I've been having this problem. We use some Magic library from GitHub that determines content as PDF very well. However, we've been receiving some files that
%PDF-
0A 0D 0A 30 0D 0A 0D 0A
So, I've added logic to check for these starting bytes 5-9, and 8 bytes in the end, when a file with PDF extension is not matched otherwise.
While the marked answer and the other answers are correct, they will not be successful 100% of the time. The problem is the PDF spec says the %PDF-1.x only needs to be in the first 1024 bytes and not the first 4. Some programs will add information before %PDF and still be valid.
I would recommend seeing the answer for the following Stack Overflow question: How to detect if a file is PDF or TIFF?