How do I calculate the MD5 checksum of a file contents in Python?

老子叫甜甜 提交于 2021-02-11 12:37:57

问题


the scenario is: I have generated md5 checksum for the pdf file which stored on the server using the following code:

 def createMd5Hash(self, file_path, pdf_title, pdf_author):
    md5_returned = None
    try:
        md5 = hashlib.md5()
        with open(file_path, 'rb') as file_to_check:
            for chunk in file_to_check:
                md5.update(chunk)
            md5_file = md5.hexdigest()
            custom_key = 'xyzkey-{}'.format(md5_file)
            md5.update(custom_key.encode())
            md5_returned = md5.hexdigest()
    except Exception as e:
        print("Error while calculate md5: {}".format(e))
  
    # code to add Hash value in metadata
    try:
        file = open(file_path, 'rb+')
        reader = PdfFileReader(file)
        writer = PdfFileWriter()
        writer.appendPagesFromReader(reader)
        metadata = reader.getDocumentInfo()
        writer.addMetadata(metadata)
        writer.addMetadata({
            '/Author': pdf_author,
            '/Title': pdf_title,
            '/HashKey': md5_returned,
        })
        writer.write(file)
        file.close()
    except Exception:
        print("Error while editing metadata")

example: HashKey = 02c85672c041c8c762474799690ad1a5

In the second part, I have added the metadata, including the hash value in the pdf. Now clients can download this file from my server (can be modified by them or not). When I received the file from any of my clients, I want to check the integrity of that file. Weather file data is modified or not. So for that, I wrote a utility that uploads the file and extract the metadata of the pdf file where I can get HashKey value which is the original hash value of the file when it was generating. When I try to decode the file using the following function, I suppose to get the same hash value if it's just downloaded from the server and not modified by my client.

def validateMd5Hash(file_path, current_md5):
md5_returned = None
try:
    md5 = hashlib.md5()
    with open(file_path, 'rb') as file_to_check:
        # read contents of the file
        for chunk in file_to_check:
            md5.update(chunk)
        # pipe contents of the file through
        md5_file = md5.hexdigest()
        private_key = 'xyzkey-{}'.format(md5_file)
        md5.update(private_key.encode())
        md5_returned = md5.hexdigest()

    if md5_returned != current_md5:
        return True

    return False
except Exception as e:
    print("Error while calculate md5: {}".format(e))

but the result is different. md5_returned = fbf79424a68892887379108a05968437 current_md5 = 02c85672c041c8c762474799690ad1a5

When I use the above function, it generates different hash value then what it has during creation. I guess the reason would be when the client downloads the file, created date and modified date change which is the reason I got new md5 checksum other than what is inside the HashKey. I am looking to generate a hash value for only the content of the file without includes metadata. Can anyone help me, please? Sorry for bad English

来源:https://stackoverflow.com/questions/64566153/how-do-i-calculate-the-md5-checksum-of-a-file-contents-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!