问题
I can bring up a web page, no problem. I can save the webpage...as html, no problem. I need to save the webpage as mht so I can can get all the html that gets hidden without saving as mht. In researching I'm coming up with absolutely nothing as to how to save as mht using python. Like I said above I can try to save it as a mht file, using the standard coded for saving as html but that simply doesn't work...not surprised it doesn't work either, but it was worth a shot.
url = 'https://www.thewebsite.com'
html = urllib.request.urlopen(url).read()
m = open('websitetest.mht', 'w')
m.write(str(html))
m.close()
The site I'm trying to save does 'hidden code' that comes across when saved as mht, but not when saved as html. Hence why I'm trying to save as mht so I get all the code and then can go through the code to pull off what I need to compile a database.
回答1:
There is a very handy Github project coded in Python 2.7 (you need to make simple modifications to make it compatible with Python 3.4). This project has code for packing/unpacking MHT files. I think this is what you are looking for:
Un/packs an MHT (MHTML) archive into/from separate files, writing/reading them in directories to match their Content-Location.
来源:https://stackoverflow.com/questions/41006742/python-save-as-mht