I am making a request to the User.messages endpoint. All objects returned (the emails) have a mimeType property which I\'m struggling to understand.
More specificall
I know this question is not new but I've wrote a PHP script which correctly parses messages pulled from Gmail API, including any type of attachment.
The script includes a recursive "iterateParts" function which iterates all message parts so we can be sure we extracted all available data from each message.
Script steps are:
$maxToPull = 1; $gmailQuery = "ALL"; // Initializing Google API $service = new Google_Service_Gmail($client); // Pulling all gmail messages into $messages array $user = 'me'; $msglist = $service->users_messages->listUsersMessages($user, ["maxResults"=>$maxToPull, "q"=>$gmailQuery]); $messages = $msglist->getMessages(); // Master array that will hold all parsed messages data, including attachments $allmsgArr = array(); // Traverse each message foreach($messages as $message) { $msgArr = array(); $single_message = $service->users_messages->get('me', $message->getId()); $payload = $single_message->getPayload(); // Nice to have the gmail msg id, can be used to direct access the message in Gmail's web gui $msgArr['gmailmsgid'] = $message->getId(); // Retrieving the subject and "from" email address foreach($payload->getheaders() as $oneheader) { if($oneheader['name'] == 'Subject') $msgArr['subject'] = $oneheader['value']; if($oneheader['name'] == 'From') $msgArr['fromaddress'] = substr($oneheader['value'], strpos($oneheader['value'], '<')+1, -1); } // If body is directly in the message payload (only for plain text messages where there's no HTML part and no attachments, normally this is not the case) if($payload['body']['size'] > 0) $msgArr['textplain'] = $payload['body']['data']; // Else, iterate over each message part and continue to dig if necessary else iterateParts($payload, $message->getId()); // Push the parsed $msgArr (parsed by iterateParts) to master array array_push($allmsgArr, $msgArr); } // Traverse each parsed message and saving it's content and attachments to files foreach($allmsgArr as $onemsgArr) { $folder = "messages/".$onemsgArr['gmailmsgid']; mkdir($folder); if($onemsgArr['textplain']) file_put_contents($folder."/textplain.txt", decodeData($onemsgArr['textplain'])); if($onemsgArr['texthtml']) file_put_contents($folder."/texthtml.html", decodeData($onemsgArr['texthtml'])); if($onemsgArr['attachments']) { foreach($onemsgArr['attachments'] as $oneattachment) { if(!empty($oneattachment['filename'])) $filename = $oneattachment['filename']; else if($oneattachment['mimetype'] == "message/rfc822" && empty($oneattachment['filename'])) // email attachments $filename = "noname.eml"; else $filename = "unknown"; file_put_contents($folder."/".$filename, decodeData($oneattachment['data'])); } } } function iterateParts($obj, $msgid) { global $msgArr; global $service; foreach($obj as $parts) { // if found body data if($parts['body']['size'] > 0) { // plain text representation of message body if($parts['mimeType'] == 'text/plain') { $msgArr['textplain'] = $parts['body']['data']; } // html representation of message body else if($parts['mimeType'] == 'text/html') { $msgArr['texthtml'] = $parts['body']['data']; } // if it's an attachment else if(!empty($parts['body']['attachmentId'])) { $attachArr['mimetype'] = $parts['mimeType']; $attachArr['filename'] = $parts['filename']; $attachArr['attachmentId'] = $parts['body']['attachmentId']; // the message holds the attachment id, retrieve it's data from users_messages_attachments $attachmentId_base64 = $parts['body']['attachmentId']; $single_attachment = $service->users_messages_attachments->get('me', $msgid, $attachmentId_base64); $attachArr['data'] = $single_attachment->getData(); $msgArr['attachments'][] = $attachArr; } } // if there are other parts inside, go get them if(!empty($parts['parts']) && !empty($parts['mimeType']) && empty($parts['body']['attachmentId'])) { iterateParts($parts->getParts(), $msgid); } } } // All data returned from API is base64 encoded function decodeData($data) { $sanitizedData = strtr($data,'-_', '+/'); return base64_decode($sanitizedData); }
This is how $allmsgArr will look like (where only one message was pulled):
Array ( [0] => Array ( [gmailmsgid] => 25k1asfa556x2da [fromaddress] => john@gmail.com [subject] => Fwd: Sea gulls picture [textplain] => UE5SIDQxQzAwMg0KDQpBUkJFTFRFU1QxDQoNCg0K [texthtml] => PGRpdiBkaXI9Imx0ciI-PHNwYW4gc3R5bGU9ImZi [attachments] => Array ( [0] => Array ( [mimetype] => image/png [filename] => sea_gulls.png [attachmentId] => ANGjdJ9tmy4d8vPXhU_BjNEFEaDODOpu29W2u5OTM7a0 [data] => iVBORw0KGgoAAAANSUhEUgAABSYAAAKWCAYAAABUP ) [1] => Array ( [mimetype] => image/jpeg [filename] => Outlook_Signature.jpg [attachmentId] => ANGjdJ-CgZTK0oK44Q8j7TlN_JlaexxGKZ_wHFfoEB [data] => 6jRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEa ) ) ) )
I think it will make sense if you think of the payload
as a part
in of itself. Let's say I send a message with just a subject and a plain message text:
From: emtholin@gmail.com
To: emtholin@gmail.com
Subject: Example Subject
This is the plain text message
This will result in the following parsed message:
{
"id": "154ecb53c10b74d8",
"threadId": "154ecb53c10b74d8",
"labelIds": [
"INBOX",
"SENT"
],
"snippet": "This is the plain text message",
"historyId": "38877",
"internalDate": "1464260181000",
"payload": {
"partId": "",
"mimeType": "text/plain",
"filename": "",
"headers": [
...
],
"body": {
"size": 31,
"data": "VGhpcyBpcyB0aGUgcGxhaW4gdGV4dCBtZXNzYWdlCg=="
}
},
"sizeEstimate": 355
}
If I send a message with a plain text part, a html part and an image, it will look like this when parsed:
{
"id": "154ed5ccaa12f3df",
"threadId": "154ed5ccaa12f3df",
"labelIds": [
"SENT",
"INBOX",
"IMPORTANT"
],
"snippet": "This is a plain/html message with an image.",
"historyId": "841379",
"internalDate": "1464271162000",
"payload": {
"mimeType": "multipart/mixed",
"filename": "",
"headers": [
...
],
"body": {
"size": 0
},
"parts": [
{
"mimeType": "multipart/alternative",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "multipart/alternative; boundary=089e0122896c7c80d80533bf3205"
}
],
"body": {
"size": 0
},
"parts": [
{
"partId": "0.0",
"mimeType": "text/plain",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "text/plain; charset=UTF-8"
}
],
"body": {
"size": 47,
"data": "VGhpcyBpcyBhIHBsYWluL2h0bWwgKm1lc3NhZ2UqIHdpdGggYW4gaW1hZ2UuDQo="
}
},
{
"partId": "0.1",
"mimeType": "text/html",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "text/html; charset=UTF-8"
}
],
"body": {
"size": 73,
"data": "PGRpdiBkaXI9Imx0ciI-VGhpcyBpcyBhIHBsYWluL2h0bWwgPGI-bWVzc2FnZTwvYj4gd2l0aCBhbiBpbWFnZS48L2Rpdj4NCg=="
}
}
]
},
{
"partId": "1",
"mimeType": "image/png",
"filename": "smile.png",
"headers": [
...
],
"body": {
"attachmentId": "ANGjdJ-OrSy7VAYL-UbRyNtmySbZLlV-fV43zJF0_neNGZ8yKugsZAxb32eSb-CrbYIhF9NvjGwBVEjSkRrUWoCS7aDpgoQnt9WR7f2sa17qVEyOg_JVSbrGrunirvQw2dY-SxxB3Y0JP3aYDHSBXpNO6fFCByVFWQDw1et5Mh9di7bGO4AWOLKFVe_Yb2RmdDwuazGXGb8zA88TTMaiEPIacPTNiVtBrIWG0EKGxHBhep9j8ujyWeCS5P9X80dBHvBNj4T9XjUwcrN6FvwegRewRMM9cBupY7jQESR7915OcbhCNyi5l64x6vVh1ZU",
"size": 2002
}
}
]
},
"sizeEstimate": 3077
}
You will see it's just the RFC822-message parsed to JSON. If you just traverse the parts
, and treat the payload
as a part
itself, you will find what you are looking for.
var parts = [response.payload];
while (parts.length) {
var part = parts.shift();
if (part.parts) {
parts = parts.concat(part.parts);
}
if(part.mimeType === 'text/html') {
var decodedPart = decodeURIComponent(escape(atob(part.body.data.replace(/\-/g, '+').replace(/\_/g, '/'))));
console.log(decodedPart);
}
}
There are many MIME types that can be returned, here are a few:
The definitive reference for all this is RFC 2046 https://www.ietf.org/rfc/rfc2046.txt (you might want to also see 2044 and 2045)
To answer your question, build a tree of the message, and look either for:
An example of a complex message:
multipart/mixed
Based on the Tholle idea, I've completed his script to extract Gmail body and attachments.
First of all, you should fetch any gmail-message object and then parse it. You can fetch any gmail-message with this code:
const {google} = require('googleapis')
// do your authenticatoin here
const oAuth2Client = new google.auth.OAuth2(client_id, client_secret, redirectTo)
const gmail = google.gmail({ version: 'v1', auth: oAuth2Client })
const response = await this.gmail.users.messages.get({
auth: oAuth2Client,
userId: 'me',
id: messageId,
format: 'full'
})
const message_obj = response.data
Main Script:
function parser(response) {
function decode(input) {
const text = new Buffer.from(input, 'base64').toString('ascii')
return decodeURIComponent(escape(text))
}
function decode_alternative(input) {
// this way does not escape special "B" characters
// const text = Buffer.from(input, 'base64').toString('ascii')
// return decodeURIComponent(escape(text))
return base64.decode(input.replace(/-/g, '+').replace(/_/g, '/'))
}
const result = {
text: '',
html: '',
attachments: []
}
let parts = [response.payload]
while (parts.length) {
let part = parts.shift()
if (part.parts)
parts = parts.concat(part.parts)
if (part.mimeType === 'text/plain')
result.text = decode(part.body.data)
if (part.mimeType === 'text/html')
result.html = decode(part.body.data)
if (part.body.attachmentId) {
result.attachments.push({
'partId': part.partId,
'mimeType': part.mimeType,
'filename': part.filename,
'body': part.body
})
}
}
return result
}
Sample Data and response:
const with_multi_type_attachments = {
"id": "16c624e85dfd9883",
"threadId": "16c62397458f34b1",
"labelIds": [],
"snippet": "This is body. Inline-attachments my-custom-link my-custom-email-address Emoji:
I resolved this using a recursive function, in this way obtains all the text of the message without import the level of depth of the Json answer. If need more explication, please tell me.
private List<string> ObtenerTextoMensaje(IList<MessagePart> partes)
{
var listaTextos = new List<string>();
foreach(var elementoParte in partes)
{
if ((elementoParte.MimeType == "text/plain")|| (elementoParte.MimeType == "text/html"))
{
if (elementoParte.Body.Size != 0)
{
listaTextos.Add(decodificarBase64(elementoParte.Body.Data));
}
}
else
{
if(elementoParte.Parts!=null)
listaTextos = ObtenerTextoMensaje(elementoParte.Parts);
}
}
return listaTextos;
}