I\'ve been implementing some PHP/IMAP-based email handling functionality lately, and have most everything working great, except for message body decoding (in some circumstan
I know this is an old question.... But I am running into this issue now and it seems that PHP have a solution now.
this function imap_fetchstructure() will give you the type of encoding.
0 7BIT
1 8BIT
2 BINARY
3 BASE64
4 QUOTED-PRINTABLE
5 OTHER
from there you should be able to create a function like this to decode the message
function _encodeMessage($msg, $type){
if($type == 0){
return mb_convert_encoding($msg, "UTF-8", "auto");
} elseif($type == 1){
return imap_8bit($msg); //imap_utf8
} elseif($type == 2){
return imap_base64(imap_binary($msg));
} elseif($type == 3){
return imap_base64($msg);
} elseif($type == 4){
return imap_qprint($msg);
//return quoted_printable_decode($msg);
} else {
return $msg;
}
}
and you can call this function like so
$struct = imap_fetchstructure($conn, $messageNumber, 0);
$message = imap_fetchbody($conn, $messageNumber, 1);
$message = _encodeMessage($message, $struct->encoding);
echo $message;
I hope this helps someone :)
$structure = imap_fetchstructure;
NOT $encoding = $structure->encoding
BUT $encoding = $structure->parts[ $p ]->encoding
I think I had the same problem, now it's solved. (7bit didn't convert to UTF-8, kept getting ASCII) I thought I had 7bit, but changing the code to "BUT" I got $encoding=4
, not $encoding=0
which means that I have to imap_qprint($body)
and mb_convert_encoding($body, 'UTF-8', $charset)
to get what I wanted.
Anyway check the encoding number!! ( should be 4 not zero )
After spending a bit more time, I decided to just write up some heuristic detection, as Max suggested in the comments on my original question.
I've built a more robust decode7Bit()
method in Imap.php, which goes through a bunch of common encoded characters (like =A0
) and replaces them with their UTF-8 equivalents, and then also decodes messages if they look like they are base64-encoded:
/**
* Decodes 7-Bit text.
*
* PHP seems to think that most emails are 7BIT-encoded, therefore this
* decoding method assumes that text passed through may actually be base64-
* encoded, quoted-printable encoded, or just plain text. Instead of passing
* the email directly through a particular decoding function, this method
* runs through a bunch of common encoding schemes to try to decode everything
* and simply end up with something *resembling* plain text.
*
* Results are not guaranteed, but it's pretty good at what it does.
*
* @param $text (string)
* 7-Bit text to convert.
*
* @return (string)
* Decoded text.
*/
public function decode7Bit($text) {
// If there are no spaces on the first line, assume that the body is
// actually base64-encoded, and decode it.
$lines = explode("\r\n", $text);
$first_line_words = explode(' ', $lines[0]);
if ($first_line_words[0] == $lines[0]) {
$text = base64_decode($text);
}
// Manually convert common encoded characters into their UTF-8 equivalents.
$characters = array(
'=20' => ' ', // space.
'=E2=80=99' => "'", // single quote.
'=0A' => "\r\n", // line break.
'=A0' => ' ', // non-breaking space.
'=C2=A0' => ' ', // non-breaking space.
"=\r\n" => '', // joined line.
'=E2=80=A6' => '…', // ellipsis.
'=E2=80=A2' => '•', // bullet.
);
// Loop through the encoded characters and replace any that are found.
foreach ($characters as $key => $value) {
$text = str_replace($key, $value, $text);
}
return $text;
}
This was taken from version 1.0-beta2 of the Imap class for PHP that I have on GitHub.
If you have any ideas for making this more efficient, let me know. I originally tried running everything through quoted_printable_decode()
, but sometimes PHP would throw exceptions that were vague and unhelpful, so I gave up on that approach.