Parsing Email Body with 7BIT Content-Transfer-Encoding - PHP

蹲街弑〆低调 提交于 2019-11-29 07:18:00

After spending a bit more time, I decided to just write up some heuristic detection, as Max suggested in the comments on my original question.

I've built a more robust decode7Bit() method in Imap.php, which goes through a bunch of common encoded characters (like =A0) and replaces them with their UTF-8 equivalents, and then also decodes messages if they look like they are base64-encoded:

/**
 * Decodes 7-Bit text.
 *
 * PHP seems to think that most emails are 7BIT-encoded, therefore this
 * decoding method assumes that text passed through may actually be base64-
 * encoded, quoted-printable encoded, or just plain text. Instead of passing
 * the email directly through a particular decoding function, this method
 * runs through a bunch of common encoding schemes to try to decode everything
 * and simply end up with something *resembling* plain text.
 *
 * Results are not guaranteed, but it's pretty good at what it does.
 *
 * @param $text (string)
 *   7-Bit text to convert.
 *
 * @return (string)
 *   Decoded text.
 */
public function decode7Bit($text) {
  // If there are no spaces on the first line, assume that the body is
  // actually base64-encoded, and decode it.
  $lines = explode("\r\n", $text);
  $first_line_words = explode(' ', $lines[0]);
  if ($first_line_words[0] == $lines[0]) {
    $text = base64_decode($text);
  }

  // Manually convert common encoded characters into their UTF-8 equivalents.
  $characters = array(
    '=20' => ' ', // space.
    '=E2=80=99' => "'", // single quote.
    '=0A' => "\r\n", // line break.
    '=A0' => ' ', // non-breaking space.
    '=C2=A0' => ' ', // non-breaking space.
    "=\r\n" => '', // joined line.
    '=E2=80=A6' => '…', // ellipsis.
    '=E2=80=A2' => '•', // bullet.
  );

  // Loop through the encoded characters and replace any that are found.
  foreach ($characters as $key => $value) {
    $text = str_replace($key, $value, $text);
  }

  return $text;
}

This was taken from version 1.0-beta2 of the Imap class for PHP that I have on GitHub.

If you have any ideas for making this more efficient, let me know. I originally tried running everything through quoted_printable_decode(), but sometimes PHP would throw exceptions that were vague and unhelpful, so I gave up on that approach.

I know this is an old question.... But I am running into this issue now and it seems that PHP have a solution now.

this function imap_fetchstructure() will give you the type of encoding.

0   7BIT
1   8BIT
2   BINARY
3   BASE64
4   QUOTED-PRINTABLE
5   OTHER

from there you should be able to create a function like this to decode the message

function _encodeMessage($msg, $type){

            if($type == 0){
                return mb_convert_encoding($msg, "UTF-8", "auto");
            } elseif($type == 1){
                return imap_8bit($msg); //imap_utf8
            } elseif($type == 2){
                return imap_base64(imap_binary($msg));
            } elseif($type == 3){
                return imap_base64($msg);
            } elseif($type == 4){
                return imap_qprint($msg);
                //return quoted_printable_decode($msg);
            } else {
                return $msg;
            }
        }

and you can call this function like so

$struct = imap_fetchstructure($conn, $messageNumber, 0);
$message = imap_fetchbody($conn, $messageNumber, 1);
$message = _encodeMessage($message, $struct->encoding);
echo $message;

I hope this helps someone :)

taka02

$structure = imap_fetchstructure; NOT $encoding = $structure->encoding BUT $encoding = $structure->parts[ $p ]->encoding

I think I had the same problem, now it's solved. (7bit didn't convert to UTF-8, kept getting ASCII) I thought I had 7bit, but changing the code to "BUT" I got $encoding=4, not $encoding=0 which means that I have to imap_qprint($body) and mb_convert_encoding($body, 'UTF-8', $charset) to get what I wanted.

Anyway check the encoding number!! ( should be 4 not zero )

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!