Parsing Email Body with 7BIT Content-Transfer-Encoding - PHP

前端 未结 3 2120
遥遥无期
遥遥无期 2020-12-18 05:24

I\'ve been implementing some PHP/IMAP-based email handling functionality lately, and have most everything working great, except for message body decoding (in some circumstan

相关标签:
3条回答
  • 2020-12-18 06:04

    I know this is an old question.... But I am running into this issue now and it seems that PHP have a solution now.

    this function imap_fetchstructure() will give you the type of encoding.

    0   7BIT
    1   8BIT
    2   BINARY
    3   BASE64
    4   QUOTED-PRINTABLE
    5   OTHER
    

    from there you should be able to create a function like this to decode the message

    function _encodeMessage($msg, $type){
    
                if($type == 0){
                    return mb_convert_encoding($msg, "UTF-8", "auto");
                } elseif($type == 1){
                    return imap_8bit($msg); //imap_utf8
                } elseif($type == 2){
                    return imap_base64(imap_binary($msg));
                } elseif($type == 3){
                    return imap_base64($msg);
                } elseif($type == 4){
                    return imap_qprint($msg);
                    //return quoted_printable_decode($msg);
                } else {
                    return $msg;
                }
            }
    

    and you can call this function like so

    $struct = imap_fetchstructure($conn, $messageNumber, 0);
    $message = imap_fetchbody($conn, $messageNumber, 1);
    $message = _encodeMessage($message, $struct->encoding);
    echo $message;
    

    I hope this helps someone :)

    0 讨论(0)
  • 2020-12-18 06:11

    $structure = imap_fetchstructure; NOT $encoding = $structure->encoding BUT $encoding = $structure->parts[ $p ]->encoding

    I think I had the same problem, now it's solved. (7bit didn't convert to UTF-8, kept getting ASCII) I thought I had 7bit, but changing the code to "BUT" I got $encoding=4, not $encoding=0 which means that I have to imap_qprint($body) and mb_convert_encoding($body, 'UTF-8', $charset) to get what I wanted.

    Anyway check the encoding number!! ( should be 4 not zero )

    0 讨论(0)
  • 2020-12-18 06:23

    After spending a bit more time, I decided to just write up some heuristic detection, as Max suggested in the comments on my original question.

    I've built a more robust decode7Bit() method in Imap.php, which goes through a bunch of common encoded characters (like =A0) and replaces them with their UTF-8 equivalents, and then also decodes messages if they look like they are base64-encoded:

    /**
     * Decodes 7-Bit text.
     *
     * PHP seems to think that most emails are 7BIT-encoded, therefore this
     * decoding method assumes that text passed through may actually be base64-
     * encoded, quoted-printable encoded, or just plain text. Instead of passing
     * the email directly through a particular decoding function, this method
     * runs through a bunch of common encoding schemes to try to decode everything
     * and simply end up with something *resembling* plain text.
     *
     * Results are not guaranteed, but it's pretty good at what it does.
     *
     * @param $text (string)
     *   7-Bit text to convert.
     *
     * @return (string)
     *   Decoded text.
     */
    public function decode7Bit($text) {
      // If there are no spaces on the first line, assume that the body is
      // actually base64-encoded, and decode it.
      $lines = explode("\r\n", $text);
      $first_line_words = explode(' ', $lines[0]);
      if ($first_line_words[0] == $lines[0]) {
        $text = base64_decode($text);
      }
    
      // Manually convert common encoded characters into their UTF-8 equivalents.
      $characters = array(
        '=20' => ' ', // space.
        '=E2=80=99' => "'", // single quote.
        '=0A' => "\r\n", // line break.
        '=A0' => ' ', // non-breaking space.
        '=C2=A0' => ' ', // non-breaking space.
        "=\r\n" => '', // joined line.
        '=E2=80=A6' => '…', // ellipsis.
        '=E2=80=A2' => '•', // bullet.
      );
    
      // Loop through the encoded characters and replace any that are found.
      foreach ($characters as $key => $value) {
        $text = str_replace($key, $value, $text);
      }
    
      return $text;
    }
    

    This was taken from version 1.0-beta2 of the Imap class for PHP that I have on GitHub.

    If you have any ideas for making this more efficient, let me know. I originally tried running everything through quoted_printable_decode(), but sometimes PHP would throw exceptions that were vague and unhelpful, so I gave up on that approach.

    0 讨论(0)
提交回复
热议问题