How to restrict matches to the first 5 lines of an email body using regex in GAS [closed]

隐身守侯 提交于 2020-03-24 09:43:07

问题


I'm using the following script which is working correctly to pull 2 fields out of an email body.

This is causing the script execution time to increase significantly due to the amount of content in the body. Is there a way to make this search through only the first 5 lines of the email body?

First lines of e-mail:

Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Message:
Information:
This alert was tripped based on a user defined trigger: Every 15 minutes.

Script:

//gets first(latest) message with set label
var threads = GmailApp.getUserLabelByName('South Loop').getThreads(0,1);
if (threads && threads.length > 0) {
  var message = threads[0].getMessages()[0];
  // Get the first email message of a threads
  var tmp,
    subject = message.getSubject(),
    content = message.getPlainBody();
  // Get the plain text body of the email message
  // You may also use getRawContent() for parsing HTML

  // Implement Parsing rules using regular expressions
  if (content) {

    tmp = content.match(/Date Tripped:\s*([:\w\s]+)\r?\n/);
    var tripped = (tmp && tmp[1]) ? tmp[1].trim() : 'N/A';

    tmp = content.match(/Business Date:\s([\w\s]+\(\w+\))/);
    var businessdate = (tmp && tmp[1]) ? tmp[1].trim() : 'N/A';
  }
}

回答1:


You can use the pattern /^(?:.*\r?\n){0,5}/ to grab the first 5 lines of the email, then run your search against this smaller string. Here's a browser example with hardcoded content, but I tested it in Google Apps Script.

const Logger = console; // Remove this for GAS!

const content = `Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Message:
Information:
This alert was tripped based on a user defined trigger: Every 15 minutes.`;

const searchPattern = /(Date Tripped|Business Date): *(.+?)\r?\n/g;
const matches = [...content.match(/^(?:.*\r?\n){0,5}/)[0]
                           .matchAll(searchPattern)]

const result = Object.fromEntries(matches.map(e => e.slice(1)));
Logger.log(result);

If you wish to dynamically inject the search terms, use:

const Logger = console; // Remove this for GAS!

const content = `Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Foo: this will match because it's on line 5
Bar: this won't match because it's on line 6
Information:
`;

const searchTerms = ["Date Tripped", "Business Date", "Foo", "Bar"];
const searchPattern = new RegExp(`(${searchTerms.join("|")}): *(.+?)\r?\n`, "g");
const matches = [...content.match(/^(?:.*\r?\n){0,5}/)[0]
                           .matchAll(searchPattern)]

const result = Object.fromEntries(matches.map(e => e.slice(1)));
Logger.log(result);

ES5 version if you're using the older engine:

var Logger = console; // Remove this for GAS!

var content = "Name: Full Report\nStore: River North (Wells St)\nDate Tripped: 19 Feb 2020 1:07 PM\nBusiness Date: 19 Feb 2020 (Open)\nMessage:\nInformation:\nThis alert was tripped based on a user defined trigger: Every 15 minutes.\n";

var searchPattern = /(Date Tripped|Business Date): *(.+?)\r?\n/g;
var truncatedContent = content.match(/^(?:.*\r?\n){0,5}/)[0];
var result = {};

for (var m; m = searchPattern.exec(content); result[m[1]] = m[2]);

Logger.log(result);



回答2:


@ggorlen's answer is not precise, to my taste. Let's have a look at regex01

My problem with (?:.*\r?\n){0,5} is this: in english this regex says:

Take any number of characters (0 or more) ending with a newline. 
Do this between 0 and 5 times.

Which means any empty string matches. If you would do a global match, there's a lot of those.

So, how could you grab the first 5 lines? Be exact! So something like

^([^\r\n]+\r?\n){5}

See regex101

P.S. @ggorlen mentioned I left the default multiline matching on in regex101, and he's right about that. Your preference may vary: choosing between ignoring messages with less than 5 lines and accepting strings with empty lines depends on your particular case.

P.S.2 I've adapted my wording and disabled the multiline and global settings in regex101 to display my concerns with it.



来源:https://stackoverflow.com/questions/60307163/how-to-restrict-matches-to-the-first-5-lines-of-an-email-body-using-regex-in-gas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!