Where to find body of email depending of mimeType

后端未结

关注

 5  1884

I am making a request to the User.messages endpoint. All objects returned (the emails) have a mimeType property which I\'m struggling to understand.

More specificall

相关标签:

5条回答

情深已故

2020-12-01 10:17

I know this question is not new but I've wrote a PHP script which correctly parses messages pulled from Gmail API, including any type of attachment.

The script includes a recursive "iterateParts" function which iterates all message parts so we can be sure we extracted all available data from each message.

Script steps are:

Pull all message ids from API
Get some important headers (subject & from address)
Either body is directly on payload or send payload to iterateParts
iterateParts is parsing each message to $msgArr with it's data, base64 encoded
Push $msgArr to master array $allmsgArr
Traverse master array and save each part as file according to it's MIME type and filename


    $maxToPull = 1;
    $gmailQuery = "ALL";

    // Initializing Google API
    $service = new Google_Service_Gmail($client);

    // Pulling all gmail messages into $messages array
    $user = 'me';
    $msglist = $service->users_messages->listUsersMessages($user, ["maxResults"=>$maxToPull, "q"=>$gmailQuery]);
    $messages = $msglist->getMessages();

    // Master array that will hold all parsed messages data, including attachments
    $allmsgArr = array();

    // Traverse each message
    foreach($messages as $message)
    {
        $msgArr = array();
        $single_message = $service->users_messages->get('me', $message->getId());
        $payload = $single_message->getPayload();

        // Nice to have the gmail msg id, can be used to direct access the message in Gmail's web gui
        $msgArr['gmailmsgid'] = $message->getId();

        // Retrieving the subject and "from" email address
        foreach($payload->getheaders() as $oneheader)
        {
            if($oneheader['name'] == 'Subject')
                $msgArr['subject'] = $oneheader['value'];
            if($oneheader['name'] == 'From')
                $msgArr['fromaddress'] = substr($oneheader['value'], strpos($oneheader['value'], '<')+1, -1);
        }

        // If body is directly in the message payload (only for plain text messages where there's no HTML part and no attachments, normally this is not the case)
        if($payload['body']['size'] > 0)
            $msgArr['textplain'] = $payload['body']['data'];     
        // Else, iterate over each message part and continue to dig if necessary
        else
            iterateParts($payload, $message->getId());

        // Push the parsed $msgArr (parsed by iterateParts) to master array
        array_push($allmsgArr, $msgArr);
    }


    // Traverse each parsed message and saving it's content and attachments to files
    foreach($allmsgArr as $onemsgArr)
    {

        $folder = "messages/".$onemsgArr['gmailmsgid'];
        mkdir($folder);

        if($onemsgArr['textplain'])
            file_put_contents($folder."/textplain.txt", decodeData($onemsgArr['textplain']));
        if($onemsgArr['texthtml'])
            file_put_contents($folder."/texthtml.html", decodeData($onemsgArr['texthtml']));
        if($onemsgArr['attachments'])
        {
            foreach($onemsgArr['attachments'] as $oneattachment)
            {
                if(!empty($oneattachment['filename']))
                    $filename = $oneattachment['filename'];
                else if($oneattachment['mimetype'] == "message/rfc822" && empty($oneattachment['filename'])) // email attachments
                    $filename = "noname.eml";
                else
                    $filename = "unknown";
                file_put_contents($folder."/".$filename, decodeData($oneattachment['data']));
            }
        }
    }


    function iterateParts($obj, $msgid) {

        global $msgArr;
        global $service;
        foreach($obj as $parts)
        {
            // if found body data
            if($parts['body']['size'] > 0)
            {
                // plain text representation of message body
                if($parts['mimeType'] == 'text/plain')
                {
                    $msgArr['textplain'] = $parts['body']['data'];
                }
                // html representation of message body
                else if($parts['mimeType'] == 'text/html')
                {
                    $msgArr['texthtml'] = $parts['body']['data'];
                }
                // if it's an attachment
                else if(!empty($parts['body']['attachmentId']))
                {
                    $attachArr['mimetype'] = $parts['mimeType'];
                    $attachArr['filename'] = $parts['filename'];
                    $attachArr['attachmentId'] = $parts['body']['attachmentId'];

                    // the message holds the attachment id, retrieve it's data from users_messages_attachments
                    $attachmentId_base64 = $parts['body']['attachmentId'];
                    $single_attachment = $service->users_messages_attachments->get('me', $msgid, $attachmentId_base64);

                    $attachArr['data'] = $single_attachment->getData();

                    $msgArr['attachments'][] = $attachArr;
                }       
            }

            // if there are other parts inside, go get them
            if(!empty($parts['parts']) && !empty($parts['mimeType']) && empty($parts['body']['attachmentId']))
            {
                iterateParts($parts->getParts(), $msgid);
            }

        }
    }

    // All data returned from API is base64 encoded
    function decodeData($data)
    {
        $sanitizedData = strtr($data,'-_', '+/');
        return base64_decode($sanitizedData);
    }

This is how $allmsgArr will look like (where only one message was pulled):


Array
(
    [0] => Array
        (
            [gmailmsgid] => 25k1asfa556x2da
            [fromaddress] => john@gmail.com
            [subject] => Fwd: Sea gulls picture
            [textplain] => UE5SIDQxQzAwMg0KDQpBUkJFTFRFU1QxDQoNCg0K
            [texthtml] => PGRpdiBkaXI9Imx0ciI-PHNwYW4gc3R5bGU9ImZi
            [attachments] => Array
                (
                    [0] => Array
                        (
                            [mimetype] => image/png
                            [filename] => sea_gulls.png
                            [attachmentId] => ANGjdJ9tmy4d8vPXhU_BjNEFEaDODOpu29W2u5OTM7a0
                            [data] => iVBORw0KGgoAAAANSUhEUgAABSYAAAKWCAYAAABUP
                        )

                    [1] => Array
                        (
                            [mimetype] => image/jpeg
                            [filename] => Outlook_Signature.jpg
                            [attachmentId] => ANGjdJ-CgZTK0oK44Q8j7TlN_JlaexxGKZ_wHFfoEB
                            [data] => 6jRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEa
                        )

                )
        )
)

0 讨论(0)

夕颜

2020-12-01 10:18

I think it will make sense if you think of the payload as a part in of itself. Let's say I send a message with just a subject and a plain message text:

From: emtholin@gmail.com
To: emtholin@gmail.com
Subject: Example Subject

This is the plain text message

This will result in the following parsed message:

{
 "id": "154ecb53c10b74d8",
 "threadId": "154ecb53c10b74d8",
 "labelIds": [
  "INBOX",
  "SENT"
 ],
 "snippet": "This is the plain text message",
 "historyId": "38877",
 "internalDate": "1464260181000",
 "payload": {
  "partId": "",
  "mimeType": "text/plain",
  "filename": "",
  "headers": [
   ...
  ],
  "body": {
   "size": 31,
   "data": "VGhpcyBpcyB0aGUgcGxhaW4gdGV4dCBtZXNzYWdlCg=="
  }
 },
 "sizeEstimate": 355
}

If I send a message with a plain text part, a html part and an image, it will look like this when parsed:

{
 "id": "154ed5ccaa12f3df",
 "threadId": "154ed5ccaa12f3df",
 "labelIds": [
  "SENT",
  "INBOX",
  "IMPORTANT"
 ],
 "snippet": "This is a plain/html message with an image.",
 "historyId": "841379",
 "internalDate": "1464271162000",
 "payload": {
  "mimeType": "multipart/mixed",
  "filename": "",
  "headers": [
     ...
  ],
  "body": {
   "size": 0
  },
  "parts": [
   {
    "mimeType": "multipart/alternative",
    "filename": "",
    "headers": [
     {
      "name": "Content-Type",
      "value": "multipart/alternative; boundary=089e0122896c7c80d80533bf3205"
     }
    ],
    "body": {
     "size": 0
    },
    "parts": [
     {
      "partId": "0.0",
      "mimeType": "text/plain",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/plain; charset=UTF-8"
       }
      ],
      "body": {
       "size": 47,
       "data": "VGhpcyBpcyBhIHBsYWluL2h0bWwgKm1lc3NhZ2UqIHdpdGggYW4gaW1hZ2UuDQo="
      }
     },
     {
      "partId": "0.1",
      "mimeType": "text/html",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/html; charset=UTF-8"
       }
      ],
      "body": {
       "size": 73,
       "data": "PGRpdiBkaXI9Imx0ciI-VGhpcyBpcyBhIHBsYWluL2h0bWwgPGI-bWVzc2FnZTwvYj4gd2l0aCBhbiBpbWFnZS48L2Rpdj4NCg=="
      }
     }
    ]
   },
   {
    "partId": "1",
    "mimeType": "image/png",
    "filename": "smile.png",
    "headers": [
       ...
    ],
    "body": {
     "attachmentId": "ANGjdJ-OrSy7VAYL-UbRyNtmySbZLlV-fV43zJF0_neNGZ8yKugsZAxb32eSb-CrbYIhF9NvjGwBVEjSkRrUWoCS7aDpgoQnt9WR7f2sa17qVEyOg_JVSbrGrunirvQw2dY-SxxB3Y0JP3aYDHSBXpNO6fFCByVFWQDw1et5Mh9di7bGO4AWOLKFVe_Yb2RmdDwuazGXGb8zA88TTMaiEPIacPTNiVtBrIWG0EKGxHBhep9j8ujyWeCS5P9X80dBHvBNj4T9XjUwcrN6FvwegRewRMM9cBupY7jQESR7915OcbhCNyi5l64x6vVh1ZU",
     "size": 2002
    }
   }
  ]
 },
 "sizeEstimate": 3077
}

You will see it's just the RFC822-message parsed to JSON. If you just traverse the parts, and treat the payload as a part itself, you will find what you are looking for.

var parts = [response.payload];

while (parts.length) {
  var part = parts.shift();
  if (part.parts) {
    parts = parts.concat(part.parts);
  }

  if(part.mimeType === 'text/html') {
    var decodedPart = decodeURIComponent(escape(atob(part.body.data.replace(/\-/g, '+').replace(/\_/g, '/'))));
    console.log(decodedPart);
  }
}

0 讨论(0)

爱一瞬间的悲伤

2020-12-01 10:28
There are many MIME types that can be returned, here are a few:
- text/plain: the message body only in plain text
- text/html: the message body only in HTML
- multipart/alternative: will contain two parts that are alternatives for each othe, for example:
  - a text/plain part for the message body in plain text
  - a text/html part for the message body in html
- multipart/mixed: will contain many unrelated parts which can be:
  - multipart/alternative as above, or text/plain or text/html as above
  - application/octet-stream, or other application/* for application specific mime types for attachments
  - image/png ot other image/* for images, which could be embedded in the message.
The definitive reference for all this is RFC 2046 https://www.ietf.org/rfc/rfc2046.txt (you might want to also see 2044 and 2045)

To answer your question, build a tree of the message, and look either for:
- the first text/plain or text/html part (either in the message body or in a multipart/mixed)
- the first text/plain or text/html inside of a multipart/alternative, which may be part of a multipart mixed.
An example of a complex message:
- multipart/mixed
  - multipart/alternative
    - text/plain <- message body in plain text
    - text/html <- message body in HTML
  - application/zip <- a zip file attachment
0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2020-12-01 10:36

Based on the Tholle idea, I've completed his script to extract Gmail body and attachments.

First of all, you should fetch any gmail-message object and then parse it. You can fetch any gmail-message with this code:

const {google} = require('googleapis')
// do your authenticatoin here
const oAuth2Client = new google.auth.OAuth2(client_id, client_secret, redirectTo)
const gmail = google.gmail({ version: 'v1', auth: oAuth2Client })

const response = await this.gmail.users.messages.get({
  auth: oAuth2Client,
  userId: 'me',
  id: messageId,
  format: 'full'
})

const message_obj = response.data

Main Script:

function parser(response) {

  function decode(input) {
    const text = new Buffer.from(input, 'base64').toString('ascii')
    return decodeURIComponent(escape(text))
  }

  function decode_alternative(input) {
    // this way does not escape special "B" characters
    // const text = Buffer.from(input, 'base64').toString('ascii')
    // return decodeURIComponent(escape(text))

    return base64.decode(input.replace(/-/g, '+').replace(/_/g, '/'))
  }

  const result = {
   text: '',
   html: '',
   attachments: []
  }

  let parts = [response.payload]

  while (parts.length) {
    let part = parts.shift()

    if (part.parts)
      parts = parts.concat(part.parts)

    if (part.mimeType === 'text/plain')
      result.text = decode(part.body.data)

    if (part.mimeType === 'text/html')
      result.html = decode(part.body.data)


    if (part.body.attachmentId) {
      result.attachments.push({
        'partId': part.partId,
        'mimeType': part.mimeType,
        'filename': part.filename,
        'body': part.body
      })
    }
  }

  return result
}

Sample Data and response:

const with_multi_type_attachments = {
  "id": "16c624e85dfd9883",
  "threadId": "16c62397458f34b1",
  "labelIds": [],
  "snippet": "This is body. Inline-attachments my-custom-link my-custom-email-address Emoji:


          	          
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2020-12-01 10:41
              
            
            
                                                                       
I resolved this using a recursive function, in this way obtains all the text of the message without import the level of depth of the Json answer. If need more explication, please tell me.

 private List<string> ObtenerTextoMensaje(IList<MessagePart> partes)
    {
        var listaTextos = new List<string>();
        foreach(var elementoParte in partes)
        {
            if ((elementoParte.MimeType == "text/plain")|| (elementoParte.MimeType == "text/html"))
            {
                if (elementoParte.Body.Size != 0)
                {
                    listaTextos.Add(decodificarBase64(elementoParte.Body.Data));                        
                }
            }
            else
            {
                if(elementoParte.Parts!=null)
                listaTextos = ObtenerTextoMensaje(elementoParte.Parts);
            }
        }
        return listaTextos;
    }

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...