Python: parsing emails with embedded images

▼魔方 西西 提交于 2020-01-04 06:22:11

问题


I am working on an application which connects to the mail server using python POP3 library parses the emails and put them into database.

I have successfully parse the text emails, html emails and attachments. Now, I am stuck with the emails which contain embedded images with the emails. Server is howing CID: some code for the images in the src tag and the image is in the bytes. I am not sure how to get the images and map them with the CIDs.

Please suggest.

Thanks in advance.

below is the email content which I am getting:

Content-Type: multipart/alternative; 
               boundary="PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263"

 --PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: text/plain

 Hi, testing embedded images email!


 --PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: multipart/related; boundary="PHP-related-e0af773d09fadf5208f69aecffcb4de888824263"

 --PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: text/html

 <html>
 <head>
 <title>Test HTML Mail</title>
 </head>
 <body>
 <font color='red'>Hai, it is me!</font>
 Here is my picture: 
  <img src="cid:PHP-CID-e0af773d09fadf5208f69aecffcb4de888824263" />
 </body>
 </html>

 --PHP-related-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: image/gif
 Content-Transfer-Encoding: base64
 Content-ID: <PHP-CID-e0af773d09fadf5208f69aecffcb4de888824263> 

 iVBORw0KGgoAAAANSUhEUgAAAEYAAAAgCAMAAACYXf7xAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJ
bWFnZVJlYWR5ccllPAAAAwBQTFRF////oNKWY6ZZTnc08/304+P/6/PsRHgpZYpWGHcTWqFWe7pz
WZNFwNa+Q2UqgpZ5JGcZ4ezj7e3/6Oj/tbW62tr/aadiK1sSUHQ6oKeSI0UM5PHkAAAAaZhifHx6
yMjKWHdJY5lbi6yFW5RU0+LSnq2VmZ6Mm8iS8vL/dXVzRERFJVUJrNalcrNtkZGRLnYslsWJ3e3d
7fXwstirWYJB3ergyeTI9vb/iIiIgoKBd6V0np6ce51rU2pDqMqlVVVWTnpFhcN7NTU2RYUqpbWd
rKysOHcn5vbql6eOMWYbMkUi+fn/uOStk6yLZGRm7f7tlLGKOXg20dvNIiIiGUUER4Q0InMcaYtf
3+/e3d3czd7KjY2Nnb6WtdOzKWkmhoaGUJNNjL+FhLt7jLp9IF0Z/v7/0tLRqrijVX9UTmZA+v38
Qko5SW5EVYA9JkwPMzwocnJub7RnfZpy3vPcaGhkhYWDbm5rhISIRoZGN0gxm6aQ/Pz/OYAyXm1V
pKSpeHh2Q1M5oqKgiaZ+dZ1vbqRaTVU4k7GFe6xqpr6c1+rb3uTcfcdx0d3Qk7ePhaJ6cqVsTp5H
xNzA1ezTVotS7e7uv968+v76xtPBPlczm7OVydfDdK1t+fn7+vT91NTddpRpVmNBlLyUgKRymZmW
u9a5dati9vr35eXugrFzTVY2/v//R5M5ial+zdbJcJJn8/jz+f73SV89EREReL1vob2TUVw7orGX
YmtU///+YYZNkaKGmdKUR106iIiD9/b5VWxNmbWOudy0j4+N+//9/v/8Dw8Pd5xnf3+INF8Yjp2D
frZ2cHB30ufZb3Bt2+HY3e3WqKqiLjcrUW09q8+xLmowOXAhmbiI4+Xnjr6P5O/n5/DkeK9mQEBE
8vf5//r/9fT4U5Q9hcqGlNKNDh0FlJSXA0UAC1cJGl0KWaZQwc69yN3K/f76drVuQn0iLTkZeJds
lq+Pv9HBN1YtV21Fkb6Bkb6KmLSHtNC5t9y5DikEhLZ/W3BLMEoddqVi4vfk////U8M4kgAAAQB0

回答1:


I assume you are using Python's email package? It should handle images just fine. If you need to decode the image yourself, you need to have a look at the encoding, in this case base64. There is a module for encoding and decoding base64 in the stdlib, too.

As for the mapping, just get the Content-Id header from the images, create a dict that maps content ids to mime parts. To resolve the URLs in src, check if they start with 'cid:' (i.e. resolve to an internal mime document), strip off the prefix and look them up in the dictionary you created before.




回答2:


I copy/paste this email content. Even my formail client can't decode this mail correctly. So maybe this mail cotent is not correct or complete.




回答3:


Fixed the issue by checking the Content-Disposition value and cid in the contents.

If its attachment the file contents should be shown as attachments with the email and if its inline the contents will be shown in the body.



来源:https://stackoverflow.com/questions/4332400/python-parsing-emails-with-embedded-images

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!