parsing raw email in php

前端 未结 14 461
猫巷女王i
猫巷女王i 2020-12-23 20:50

I\'m looking for good/working/simple to use php code for parsing raw email into parts.

I\'ve written a couple of brute force solutions, but every time, one small cha

相关标签:
14条回答
  • 2020-12-23 21:23

    I met the same problem so I wrote the following class: Email_Parser. It takes in a raw email and turns it into a nice object.

    It requires PEAR Mail_mimeDecode but that should be easy to install via WHM or straight from command line.

    Get it here : https://github.com/optimumweb/php-email-reader-parser

    0 讨论(0)
  • 2020-12-23 21:25

    This https://github.com/zbateson/MailMimeParser works for me, and don't need mailparse extension.

    <?php
    echo $message->getHeaderValue('from');          // user@example.com
    echo $message
        ->getHeader('from')
        ->getPersonName();                          // Person Name
    echo $message->getHeaderValue('subject');       // The email's subject
    
    echo $message->getTextContent();                // or getHtmlContent
    
    0 讨论(0)
  • 2020-12-23 21:27

    There is a library for parsing raw email message into php array - http://flourishlib.com/api/fMailbox#parseMessage.

    The static method parseMessage() can be used to parse a full MIME email message into the same format that fetchMessage() returns, minus the uid key.

    $parsed_message = fMailbox::parseMessage(file_get_contents('/path/to/email'));

    Here is an example of a parsed message:

    array(
        'received' => '28 Apr 2010 22:00:38 -0400',
        'headers'  => array(
            'received' => array(
                0 => '(qmail 25838 invoked from network); 28 Apr 2010 22:00:38 -0400',
                1 => 'from example.com (HELO ?192.168.10.2?) (example) by example.com with (DHE-RSA-AES256-SHA encrypted) SMTP; 28 Apr 2010 22:00:38 -0400'
            ),
            'message-id' => '<4BD8E815.1050209@flourishlib.com>',
            'date' => 'Wed, 28 Apr 2010 21:59:49 -0400',
            'from' => array(
                'personal' => 'Will Bond',
                'mailbox'  => 'tests',
                'host'     => 'flourishlib.com'
            ),
            'user-agent'   => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4',
            'mime-version' => '1.0',
            'to' => array(
                0 => array(
                    'mailbox' => 'tests',
                    'host'    => 'flourishlib.com'
                )
            ),
            'subject' => 'This message is encrypted'
        ),
        'text'      => 'This message is encrypted',
        'decrypted' => TRUE,
        'uid'       => 15
    );
    
    0 讨论(0)
  • 2020-12-23 21:27

    Parsing email in PHP isn't an impossible task. What I mean is, you don't need a team of engineers to do it; it is attainable as an individual. Really the hardest part I found was creating the FSM for parsing an IMAP BODYSTRUCTURE result. Nowhere on the Internet had I seen this so I wrote my own. My routine basically creates an array of nested arrays from the command output, and the depth one is at in the array roughly corresponds to the part number(s) needed to perform the lookups. So it handles the nested MIME structures quite gracefully.

    The problem is that PHP's default imap_* functions don't provide much granularity...so I had to open a socket to the IMAP port and write the functions to send and retrieve the necessary information (IMAP FETCH 1 BODY.PEEK[1.2] for example), and that involves looking at the RFC documentation.

    The encoding of the data (quoted-printable, base64, 7bit, 8bit, etc.), length of the message, content-type, etc. is all provided to you; for attachments, text, html, etc. You may have to figure out the nuances of your mail server as well since not all fields are always implemented 100%.

    The gem is the FSM...if you have a background in Comp Sci it can be really really fun to make this (they key is that brackets are not a regular grammar ;)); otherwise it will be a struggle and/or result in ugly code, using traditional methods. Also you need some time!

    Hope this helps!

    0 讨论(0)
  • 2020-12-23 21:30

    You're probably not going to have much fun writing your own MIME parser. The reason you are finding "overdeveloped mail handling packages" is because MIME is a really complex set of rules/formats/encodings. MIME parts can be recursive, which is part of the fun. I think your best bet is to write the best MIME handler you can, parse a message, throw away everything that's not text/plain or text/html, and then force the command in the incoming string to be prefixed with COMMAND: or something similar so that you can find it in the muck. If you start with rules like that you have a decent chance of handling new providers, but you should be ready to tweak if a new provider comes along (or heck, if your current provider chooses to change their messaging architecture).

    0 讨论(0)
  • 2020-12-23 21:34

    I'm not sure if this will be of help to you - hope so - but it will surely help others interested in finding out more about email. Marcus Bointon did one of the best presentations entitled "Mail() and life after Mail()" at the PHP London conference in March this year and the slides and MP3 are online. He speaks with some authority, having worked extensively with email and PHP at a deep level.

    My perception is that you are in for a world of pain trying to write a truly generic parser.

    EDIT - The files seem to have been removed on the PHP London site; found the slides on Marcus' own site: Part 1 Part 2 Couldn't see the MP3 anywhere though

    0 讨论(0)
提交回复
热议问题