发表新帖

发表新帖

Check whether a PDF-File is valid with Python

后端未结

关注

 7  611

借酒劲吻你 2020-12-08 10:50

I get a File via a HTTP-Upload and need to be sure its a pdf-file. Programing Language is Python, but this should not matter.

I thought of the follow

7条回答

南方客 (楼主)

2020-12-08 11:30
Here is a solution using pdfminersix, which can be installed with pip install pdfminer.six:
```
from pdfminer.high_level import extract_text

def is_pdf(path_to_file):
    try:
        extract_text(path_to_file)
        return True
    except:
        return False
```
You can also use filetype (pip install filetype):
```
import filetype

def is_pdf(path_to_file):
    return filetype.guess(path_to_file).mime == 'application/pdf'
```
Neither of these solutions is ideal.
1. The problem with the filetype solution is that it doesn't tell you if the PDF itself is readable or not. It will tell you if the file is a PDF, but it could be a corrupt PDF.
2. The pdfminer solution should only return True if the PDF is actually readable. But it is a big library and seems like overkill for such a simple function.
I've started another thread here asking how to check if a file is a valid PDF without using a library (or using a smaller one).
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题