I get a File via a HTTP-Upload and need to be sure its a pdf-file. Programing Language is Python, but this should not matter.
I thought of the follow
Here is a solution using pdfminersix, which can be installed with pip install pdfminer.six
:
from pdfminer.high_level import extract_text
def is_pdf(path_to_file):
try:
extract_text(path_to_file)
return True
except:
return False
You can also use filetype (pip install filetype
):
import filetype
def is_pdf(path_to_file):
return filetype.guess(path_to_file).mime == 'application/pdf'
Neither of these solutions is ideal.
filetype
solution is that it doesn't tell you if the PDF itself is readable or not. It will tell you if the file is a PDF, but it could be a corrupt PDF.pdfminer
solution should only return True
if the PDF is actually readable. But it is a big library and seems like overkill for such a simple function.I've started another thread here asking how to check if a file is a valid PDF without using a library (or using a smaller one).