问题
I know that to get all message body, this is the command:
[imap_code] UID FETCH [uid] BODY.PEEK[TEXT]
Thus I get the entire message body. But I need to exclude the part of the attachments. I want only message wrote from sender, text and/or html.
Is there a way?
This is a full raw html mail with attachment
http://pastebin.com/FMEQdLM3
I would like to get only
<div dir="ltr">This is the message body<div><ul><li>one</li><li>two</li></ul></div></div>
or plain text if there isn't html version
回答1:
Messages are laid out in an arbitrary tree of parts, with parent items being of the multipart/* or message/rfc822 type, and children being of other types. The FETCH BODY[...]
lets arbitrarily extract any of these parts.
Unfortunately, there is no standard layout for messages. You can fetch the BODYSTRUCTURE item to get the MIME layout of a message, but it is very difficult to parse by eye.
That being said, there's a few common message layouts that will get you most of the way.
The easiest is a message with just one body, either text/html or text/plain. Just fetch BODY[TEXT]
.
The next is multi-format, with both text/html and text/plain. Its MIME structure generally looks like this:
+ multipart/alternative [TEXT]
|- text/plain [1]
\- text/html [2]
In this case you want to fetch BODY[2]
.
If the message is single-body, with attachments, it will look something like this:
+ multipart/mixed or multipart/related [TEXT]
|- text/html or text/plain [1]
|- image/jpg [2]
| ...
\- image/gif
In this case you want BODY[1]
.
Last is both of these: multi-format body with attachments. It will tend to look something like:
+ multipart/mixed or multipart/related [TEXT]
|-+ multipart/alternative [1]
| |- text/plain [1.1]
| \- text/html [1.2]
|- image/jpeg [2]
|- image/gif [3]
|...
\- image/png
In this case, you probably want BODY[1.2]
. Your sample message is of this type.
In addition, the bodies may be encoded in Quoted-Printable or Base64 encoding. Unfortunately, Baseline IMAP does not provide any way for the server to decode this for you. Quoted-Printable can be mostly read if the message is ascii, but will have lots of
=
escapes throughout the body. If it's base64, you're not going to be able to decipher it by eye. The BINARY IMAP extension can help with this, but this is not widely deployed.
来源:https://stackoverflow.com/questions/37787767/fetch-imap-body-message-by-telnet