问题
I am working on an application which connects to the mail server using python POP3 library parses the emails and put them into database.
I have successfully parse the text emails, html emails and attachments. Now, I am stuck with the emails which contain embedded images with the emails. Server is howing CID: some code for the images in the src tag and the image is in the bytes. I am not sure how to get the images and map them with the CIDs.
Please suggest.
Thanks in advance.
below is the email content which I am getting:
Content-Type: multipart/alternative;
boundary="PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263"
--PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
Content-Type: text/plain
Hi, testing embedded images email!
--PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
Content-Type: multipart/related; boundary="PHP-related-e0af773d09fadf5208f69aecffcb4de888824263"
--PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
Content-Type: text/html
<html>
<head>
<title>Test HTML Mail</title>
</head>
<body>
<font color='red'>Hai, it is me!</font>
Here is my picture:
<img src="cid:PHP-CID-e0af773d09fadf5208f69aecffcb4de888824263" />
</body>
</html>
--PHP-related-e0af773d09fadf5208f69aecffcb4de888824263
Content-Type: image/gif
Content-Transfer-Encoding: base64
Content-ID: <PHP-CID-e0af773d09fadf5208f69aecffcb4de888824263>
iVBORw0KGgoAAAANSUhEUgAAAEYAAAAgCAMAAACYXf7xAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJ
bWFnZVJlYWR5ccllPAAAAwBQTFRF////oNKWY6ZZTnc08/304+P/6/PsRHgpZYpWGHcTWqFWe7pz
WZNFwNa+Q2UqgpZ5JGcZ4ezj7e3/6Oj/tbW62tr/aadiK1sSUHQ6oKeSI0UM5PHkAAAAaZhifHx6
yMjKWHdJY5lbi6yFW5RU0+LSnq2VmZ6Mm8iS8vL/dXVzRERFJVUJrNalcrNtkZGRLnYslsWJ3e3d
7fXwstirWYJB3ergyeTI9vb/iIiIgoKBd6V0np6ce51rU2pDqMqlVVVWTnpFhcN7NTU2RYUqpbWd
rKysOHcn5vbql6eOMWYbMkUi+fn/uOStk6yLZGRm7f7tlLGKOXg20dvNIiIiGUUER4Q0InMcaYtf
3+/e3d3czd7KjY2Nnb6WtdOzKWkmhoaGUJNNjL+FhLt7jLp9IF0Z/v7/0tLRqrijVX9UTmZA+v38
Qko5SW5EVYA9JkwPMzwocnJub7RnfZpy3vPcaGhkhYWDbm5rhISIRoZGN0gxm6aQ/Pz/OYAyXm1V
pKSpeHh2Q1M5oqKgiaZ+dZ1vbqRaTVU4k7GFe6xqpr6c1+rb3uTcfcdx0d3Qk7ePhaJ6cqVsTp5H
xNzA1ezTVotS7e7uv968+v76xtPBPlczm7OVydfDdK1t+fn7+vT91NTddpRpVmNBlLyUgKRymZmW
u9a5dati9vr35eXugrFzTVY2/v//R5M5ial+zdbJcJJn8/jz+f73SV89EREReL1vob2TUVw7orGX
YmtU///+YYZNkaKGmdKUR106iIiD9/b5VWxNmbWOudy0j4+N+//9/v/8Dw8Pd5xnf3+INF8Yjp2D
frZ2cHB30ufZb3Bt2+HY3e3WqKqiLjcrUW09q8+xLmowOXAhmbiI4+Xnjr6P5O/n5/DkeK9mQEBE
8vf5//r/9fT4U5Q9hcqGlNKNDh0FlJSXA0UAC1cJGl0KWaZQwc69yN3K/f76drVuQn0iLTkZeJds
lq+Pv9HBN1YtV21Fkb6Bkb6KmLSHtNC5t9y5DikEhLZ/W3BLMEoddqVi4vfk////U8M4kgAAAQB0
回答1:
I assume you are using Python's email package? It should handle images just fine. If you need to decode the image yourself, you need to have a look at the encoding, in this case base64. There is a module for encoding and decoding base64 in the stdlib, too.
As for the mapping, just get the Content-Id header from the images, create a dict that maps content ids to mime parts. To resolve the URLs in src, check if they start with 'cid:' (i.e. resolve to an internal mime document), strip off the prefix and look them up in the dictionary you created before.
回答2:
I copy/paste this email content. Even my formail client can't decode this mail correctly. So maybe this mail cotent is not correct or complete.
回答3:
Fixed the issue by checking the Content-Disposition
value and cid in the contents.
If its attachment the file contents should be shown as attachments with the email and if its inline the contents will be shown in the body.
来源:https://stackoverflow.com/questions/4332400/python-parsing-emails-with-embedded-images