I\'m looking for good/working/simple to use php code for parsing raw email into parts.
I\'ve written a couple of brute force solutions, but every time, one small cha
yeah, ive been able to write a basic parser, based off that rfc and some other basic tutorials. but its the multipart mime nested boundaries that keep messing me up.
i found out that MMS (not SMS) messages sent from my phone are just standard emails, so i have a system that reads the incoming email, checks the from (to only allow from my phone), and uses the body part to run different commands on my server. its sort of like a remote control by email.
because the system is designed to send pictures, its got a bunch of differently encoded parts. a mms.smil.txt part, a text/plain (which is useless, just says 'this is a html message'), a application/smil part (which the part that phones would pic up on), a text/html part with a advertisement for my carrier, then my message, but all wrapped in html, then finally a textfile attachment with my plain message (which is the part i use) (if i shove an image as an attachment in the message, its put at attachment 1, base64 encoded, then my text portion is attached as attachment 2)
i had it working with the exact mail format from my carrier, but when i ran a message from someone elses phone through it, it failed in a whole bunch of miserable ways.
i have other projects i'd like to extend this phone->mail->parse->command system to, but i need to have a stable/solid/generic parser to get the different parts out of the mail to use it.
my end goal would be to have a function that i could feed the raw piped mail into, and get back a big array with associative sub-arrays of headers var:val pairs, and one for the body text as a whole string
the more and more i search on this, the more i find the same thing: giant overdeveloped mail handling packages that do everything under the sun thats related to mails, or useless (to me, in this project) tutorials.
i think i'm going to have to bite the bullet and just carefully write something my self.
What are you hoping to end up with at the end? The body, the subject, the sender, an attachment? You should spend some time with RFC2822 to understand the format of the mail, but here's the simplest rules for well formed email:
HEADERS\n
\n
BODY
That is, the first blank line (double newline) is the separator between the HEADERS and the BODY. A HEADER looks like this:
HSTRING:HTEXT
HSTRING always starts at the beginning of a line and doesn't contain any white space or colons. HTEXT can contain a wide variety of text, including newlines as long as the newline char is followed by whitespace.
The "BODY" is really just any data that follows the first double newline. (There are different rules if you are transmitting mail via SMTP, but processing it over a pipe you don't have to worry about that).
So, in really simple, circa-1982 RFC822 terms, an email looks like this:
HEADER: HEADER TEXT
HEADER: MORE HEADER TEXT
INCLUDING A LINE CONTINUATION
HEADER: LAST HEADER
THIS IS ANY
ARBITRARY DATA
(FOR THE MOST PART)
Most modern email is more complex than that though. Headers can be encoded for charsets or RFC2047 mime words, or a ton of other stuff I'm not thinking of right now. The bodies are really hard to roll your own code for these days to if you want them to be meaningful. Almost all email that's generated by an MUA will be MIME encoded. That might be uuencoded text, it might be html, it might be a uuencoded excel spreadsheet.
I hope this helps provide a framework for understanding some of the very elemental buckets of email. If you provide more background on what you are trying to do with the data I (or someone else) might be able to provide better direction.