问题
Is there any way of checking if a byte[] is a pdf without opening?
I have some code to display a list of byte[] as pdf thumbnails. I previously knew all the byte[] were pdf's because we filtered the servlet to only return these. Now the requirement has changed and I need to bring all file types back. Is there any way of checking what the byte[] is, or more specifically determining if it isn't, a pdf?
回答1:
Check the first 4 bytes of the array.
If those are 0x25 0x50 0x44 0x46
then it's most probably a PDF file.
回答2:
First four bytes should be: 0x25 0x50 0x44 0x46
(in hex format, in ASCII it's %PDF
). "Magic numbers" for another formats you can find here
回答3:
As far as I know all PDF's start with %PDF
, so you could check the first bytes against this string.
回答4:
While the marked answer and the other answers are correct, they will not be successful 100% of the time. The problem is the PDF spec says the %PDF-1.x only needs to be in the first 1024 bytes and not the first 4. Some programs will add information before %PDF and still be valid.
I would recommend seeing the answer for the following Stack Overflow question: How to detect if a file is PDF or TIFF?
来源:https://stackoverflow.com/questions/6186980/determine-if-a-byte-is-a-pdf-file