How do I calculate the md5 checksum of a file in Python?

牧云@^-^@ 提交于 2019-11-30 06:13:26

问题


I have made a code in python that checks for an md5 in a file and makes sure the md5 matches that of the original. Here is what I have developed:

#Defines filename
filename = "file.exe"

#Gets MD5 from file 
def getmd5(filename):
    return m.hexdigest()

md5 = dict()

for fname in filename:
    md5[fname] = getmd5(fname)

#If statement for alerting the user whether the checksum passed or failed

if md5 == '>md5 will go here<': 
    print("MD5 Checksum passed. You may now close this window")
    input ("press enter")
else:
    print("MD5 Checksum failed. Incorrect MD5 in file 'filename'. Please download a    new copy")
    input("press enter") 
exit

But whenever I run the code, I get the following:

Traceback (most recent call last):
File "C:\Users\Username\md5check.py", line 13, in <module>
 md5[fname] = getmd5(fname)
File "C:\Users\Username\md5check.py, line 9, in getmd5
  return m.hexdigest()
NameError: global name 'm' is not defined

Is there anything I am missing in my code?


回答1:


In regards to your error and what's missing in your code. m is a name which is not defined for getmd5() function. No offence, I know you are a beginner, but your code is all over the place. Let's look at your issues one by one :) First off, you are not using hashlib.md5.hexdigest() method correctly. Please find explanation on hashlib functions Python Doc Library. The correct way to return MD5 for provided string is to do something like this:

>>> import hashlib
>>> hashlib.md5("filename.exe").hexdigest()
'2a53375ff139d9837e93a38a279d63e5'

However, you have a bigger problem here. You are calculating MD5 on a file name string, where in reality MD5 is calculated based on file contents. You will need to basically read file contents and pipe it though MD5. My next example is not very efficient, but something like this:

>>> import hashlib
>>> hashlib.md5(open('filename.exe','rb').read()).hexdigest()
'd41d8cd98f00b204e9800998ecf8427e'

As you can clearly see second MD5 hash is totally different from the first one. The reason for that is that we are pushing contents of the file through, not just file name. A simple solution could be something like that:

# Import hashlib library (md5 method is part of it)
import hashlib

# File to check
file_name = 'filename.exe'

# Correct original md5 goes here
original_md5 = '5d41402abc4b2a76b9719d911017c592'  

# Open,close, read file and calculate MD5 on its contents 
with open(file_name) as file_to_check:
    # read contents of the file
    data = file_to_check.read()    
    # pipe contents of the file through
    md5_returned = hashlib.md5(data).hexdigest()

# Finally compare original MD5 with freshly calculated
if original_md5 == md5_returned:
    print "MD5 verified."
else:
    print "MD5 verification failed!."

Please look at the post Python: Generating a MD5 checksum of a file it explains in detail a couple of ways how it can be achieved efficiently.

Best of luck.




回答2:


In Python 3.8+ you can do

import hashlib

with open("your_filename.png", "rb") as f:
    file_hash = hashlib.md5()
    while chunk := f.read(8192):
        file_hash.update(chunk)

print(file_hash.digest())
print(file_hash.hexdigest())  # to get a printable str instead of bytes

Consider using hashlib.blake2b instead of md5 (just replace md5 with blake2b in the above snippet). It's cryptographically secure and faster than MD5.



来源:https://stackoverflow.com/questions/16874598/how-do-i-calculate-the-md5-checksum-of-a-file-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!