How does one parse best each item of an ingredient list and does create a new object based on each parsing result?

耗尽温柔 提交于 2021-02-17 06:05:21

问题


I have this list of ingredients I am trying to make a regex to look for 1 cup , or 1 tsp or 1 tablespoon and so on.....

I have made this regex but It doesn't work as well. I am trying separate ingredients from the measurements.

So with this string 1 Chopped Tomato it should take out the 1 as amount and output this:

const output = [
  {
    val: "Chopped Tomato",
    amount: "1",
  },

And with this string below it should be able to take out ½ tsp from ½ tsp fine salt and output this:

const output = [
  {
    val: "fine sea salt",
    amount: "½ tsp",
  },

These are the values I am using for the measurements:

    const measures = [
      "tbsp","tablespoon","tsp","teaspoon","oz","ounce","fl. oz","fluid ounce","cup","qt",
      "quart","pt","pint","gal","gallon","mL","ml","milliliter","g","grams","kg","kilogram","l","liter",
];

This is the input and regex I built

const Ingris = [
  "1 teaspoon heavy cream",
  "1 Chopped Tomato",
  "1/2 Cup yogurt",
  "1 packet pasta ",
  "2 ounces paprika",
]


const FilterFunction = (term) => {
  let data = []
  if (term) {
    const newData = Ingris.filter(({
      ingridients
    }) => {
      if (RegExp(term, "gim").exec(ingridients))
        return ingridients.filter(({
            val
          }) =>
          RegExp(term, "gim").exec(val)
        ).length;
    })
    data.push(newData)
  } else {
    data = []
  }
};
console.log(FilterFunction("cup"))

Desired Output:

const output = [
  {
    val: "Tomato",
    amount: "1 Chopped ",
  },
  {
    val: "yogurt",
    amount: "1/2 Cup",
  },
  {
    val: "1",
    amount: "packet pasta ",
  },
  {
    val: "fine sea salt",
    amount: "½ tsp",
  },
  {
    val: "heavy cream",
    amount: "1/2 teaspoon",
  },
  {
    val: "paprika",
    amount: "2 ounces",
  },
];

回答1:


Here is something that worked when I added packet and ounces (plural)

It handles

  • Just amounts like 1, 2, ¼, ½, ¾ and 1/2
  • Just words without amounts like "Ground meat"
  • Compound measures like "fluid ounces" in singular and plural
  • Action words like chopped or ground

All handled by one and a half regex and one destructuring assignment

const measures = [
  "tbsp", "tablespoon", "tsp", "teaspoon", "oz", "ounce", "ounces", "cup", "qt", "packet", "quart", "pt", "pint", "gal", "gallon", "mL", "ml", "milliliter", "g", "grams", "kg", "kilogram", "l", "liter", 
  "fl. oz", "fluid ounce", "fluid ounces" ]; // plural after singular!
const action = ["chopped","ground"]  

const compound = measures.filter(measure => measure.split(" ").length > 1); // extract compound words

const amountRe =     /^(\d+\/\d+|¼|½|¾|\d|\d+)/; // amounts like 1, 1/2 etc
const amountValueRe = /(\d+\/\d+|¼|½|¾|\d|\d+) ([\w.]+) (.*)/; // first part must be the same as amountRe

const makeList = list => list.map(line => {
  if (!amountRe.test(line)) return { value: line }; // no amounts found

  // test for compound measures
  compound.forEach(cmp => line = line.replace(cmp, cmp.split(" ").join("_"))); // add underscores if found
  
  // destruct the match on amount plus value or amount of amount plus value
  let [, num, measure, what] = line.match(amountValueRe);
  
  if (action.includes(measure.toLowerCase())) { // test for chopped
    what = `${measure} ${what}`; // or add an action item to the object
    measure = "";
  }
  
  const obj = {}
  if (num) obj.amount = num;
  if (measure) obj.measure = measure.split("_").join(" ").trim(); // remove added underscores
  if (what) obj.value = what;
  return obj;
});

const Ingris = [
  "Chicken breast",
  "Ground ginger",
  "1 teaspoon heavy cream",
  "2 fluid ounces lemon juice",
  "1 Chopped Tomato",
  "1/2 Cup yogurt",
  "2 fl. oz paprika",
  "1 fluid ounce water",
  "½ packet pasta ",
  "2 ounces paprika"
];

console.log(makeList(Ingris))



回答2:


Here is a sample to complete with units you want :

^([0-9¼½¾]*)\s+(tsp|cups|cup|etc)?\s?(.*)$

const regex = /^([0-9¼½¾]*)\s+(tsp|cups|cup|etc)?\s?(.*)$/gm;
const str = `½ tsp fine salt
1 Chopped Tomato
3 cups of flour`;

const dom = document.getElementById('result');

while ((m = regex.exec(str)) !== null) {
     console.log('m: ', m);

    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        if (typeof match != 'undefined') {
            console.log('match : "'+match+'"');
            console.log('groupIndex : "'+groupIndex+'"');
            dom.innerHTML += match + '<br>';
        }
    });
    
    dom.innerHTML += '<br>';
}
<div id="result"></div>

Edit, add REGEX comments :

  • ^ : start of the line
  • ([0-9¼½¾]*) : quantities, any number or 1/2, 1/4, 3/4. Can be present multiple time
  • \s+ : one or more space
  • (tsp|cups|cup|etc)? : units, not required. Can only be tsp or cups or cups or atc (replace atc by all want you need)
  • \s? : maybe a space
  • (.*) : anything
  • $ : end of line



回答3:


The following approach is based on two assumptions.

  1. The OP always works with the same reliable syntax in how an ingredient item is literally described. This syntax comes with two flavours.
  2. The first one roughly reads like ... "<Amount value> <Amount unit> <Ingredient>". The second one is more simple like ... "<Amount value> <Ingredient>".

In order to come up with an easy to maintain/refactor implementation, one should separate dependencies that are strictly specified from the most generic computation parts.

Thus one might implement an entirely generic reduce task that actually does map the given list of ingredients but does use the reduce method's accumulator as an comfortably to read/write config or collector object.

The main purpose of the latter is to carry two regular expressions, the primary one for capturing the more advanced ingredient syntax, as described before; the secondary one is for capturing the less advanced syntax.

Which leaves one to the creation of the regular expressions ...

For "<Amount value> <Amount unit> <Ingredient>" there is a strong dependency with <Amount unit> which by itself departs an ingredient into 3 groups. One does not need to know how <Amount value> or <Ingredient> is specified as long as one can rely on a strict specification/list of what a valid measuring unit is allowed to be.

Thus one has to generate a validly capturing regex from such a list. The more tricky part about this task is not to forget, that units might contain characters that are equal to regex control characters and therefore need to be escaped/sanitized. (Example: "fl. oz." might get sanitized to "fl\.\s*oz\." before it will be part of the dynamically to be created regex.)

The second regex has to handle "<Amount value> <Ingredient>" where again it seems to be very clear what an amount is made from. The regex does reflect it by allowing all the different options that are either of the following ...

  • ¼
  • ½
  • ¾
  • any number followed by / followed by any number
  • just any number

Both regular expressions have in common that they do capture named groups in order to enable/support the generic approach of the above mentioned reducer functionality.

Example code:

const measuringUnitList = [
  'tbsp', 'tablespoons', 'tablespoon', 'tsp', 'teaspoons', 'teaspoon', 'packets', 'packet',
  'oz', 'ounces', 'ounce', 'fl. oz', 'fl. ounces', 'fl. ounce', 'fluid ounces', 'fluid ounce',
  'cups', 'cup', 'qt', 'quarts', 'quart', 'pt', 'pints', 'pint', 'gal', 'gallons', 'gallon',
  'ml', 'milliliter', 'l', 'liter',
  'g', 'gram', 'kg', 'kilogram'
];

function createUnitCentricCapturingRegX(unitList) {
  // see: [https://regex101.com/r/6ov8Pg/1]
  // e.g. (/^(?<amount>.*?)\s*\b(?<unit>tsp|...|fl\.\s*ounces|fl\.\s*ounce|cup)\b\s*(?<content>.*)$/)

  const options = unitList
    .map(unit => escapeRegExpSearchString(unit))
    .join('|')
    .replace((/\\\.\\s\+/g), '\\\.\\s*');

  return RegExp('^(?<amount>.*?)\\s*\\b(?<unit>' + options + ')\\b\\s*(?<content>.*)$', 'i');
}

// see: [https://regex101.com/r/Iwgagu/1/]
const unitlessCapturingRegX = (/^(?<amount>¼|½|¾|\d+\/\d+|\d+)\s*(?<content>.*)$/);


function collectNamedCaptureGroupData(collector, item) {
  item = item.trim();

  const { regXPrimary, regXSecondary, list } = collector;
  const result = regXPrimary.exec(item) || regXSecondary.exec(item);

  list.push(
    (result && result.groups && Object.assign({}, result.groups))
    || item
  );
  return collector;
}


const ingredientList = [
  'unclear amount of whatever',
  '2 fl. ounces paprika',
  '1 Chopped Tomato',
  '1/2 Cup yogurt',
  '1 packet pasta',
  '½ tsp fine sea salt',
  '1/2 teaspoon heavy cream',
  '2 ounces paprika',
  'another, not precise, ingredient description',
  // ... honoring @mplungjan's comment  ...
  // https://stackoverflow.com/questions/63880334/how-does-one-parse-best-each-item-of-an-ingredient-list-and-does-create-a-new-ob/63881012?noredirect=1#comment113000116_63881012
  '3 ounces of Ginger/Garlic made from 1 clove of garlic and 10 cm ginger'
];

console.log(
  ingredientList.reduce(collectNamedCaptureGroupData, {

    regXPrimary: createUnitCentricCapturingRegX(measuringUnitList),
    regXSecondary: unitlessCapturingRegX,
    list: []

  }).list
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
<script>
  //  see at StackOverflow ...
  //
  //  ... "How to escape regular expression special characters using javascript?"
  //
  //  [https://stackoverflow.com/questions/3115150/how-to-escape-regular-expression-special-characters-using-javascript/9310752#9310752]
  //
  function escapeRegExpSearchString(text) {
    // return text.replace(/[-[\]{}()*+?.,\\^$|#\\s]/g, '\\$&');
    // ... slightly changed ...
    return text
      .replace(/[-[\]{}()*+?.,\\^$|#]/g, '\\$&')
      .replace((/\s+/), '\\s+');
  }
</script>

Finally, in order to compute exactly the result the OP did ask for, and also in order to prove maintainability (easy refactoring) one just needs to do following within the next code iteration ...

  1. In line 2 of the measuringUnitList replace , 'packets', 'packet' with , 'chopped'.
  2. In line 18 of createUnitCentricCapturingRegX change the regex creation ...
  • from '^(?<amount>.*?)\\s*\\b(?<unit>' + options + ')\\b\\s*(?<content>.*)$'
  • to ... '^(?<amount>.*?\\s*\\b(?:' + options + '))\\b\\s*(?<val>.*)$'
  1. In line 20 change the secondary regex ...
  • from (/^(?<amount>¼|½|¾|\d+\/\d+|\d+)\s*(?<content>.*)$/)
  • to ... (/^(?<amount>¼|½|¾|\d+\/\d+|\d+)\s*(?<val>.*)$/)
  1. One introduces a defaultKey property into the generic implementation of collectNamedCaptureGroupData which gets assigned any item that could neither be handled by the primary nor by the secondary regex. ...

const measuringUnitList = [
  'tbsp', 'tablespoons', 'tablespoon', 'tsp', 'teaspoons', 'teaspoon', 'chopped',
  'oz', 'ounces', 'ounce', 'fl. oz', 'fl. ounces', 'fl. ounce', 'fluid ounces', 'fluid ounce',
  'cups', 'cup', 'qt', 'quarts', 'quart', 'pt', 'pints', 'pint', 'gal', 'gallons', 'gallon',
  'ml', 'milliliter', 'l', 'liter',
  'g', 'gram', 'kg', 'kilogram'
];

function createUnitCentricCapturingRegX(unitList) {
  // see: [https://regex101.com/r/7bmGXN/1/]
  // e.g. (/^(?<amount>.*?)\s*\b(?<unit>tsp|...|fl\.\s*ounces|fl\.\s*ounce|cup)\b\s*(?<content>.*)$/)

  const options = unitList
    .map(unit => escapeRegExpSearchString(unit))
    .join('|')
    .replace((/\\\.\\s\+/g), '\\\.\\s*');

  return RegExp('^(?<amount>.*?\\s*\\b(?:' + options + '))\\b\\s*(?<val>.*)$', 'i');
}
const unitlessCapturingRegX = (/^(?<amount>¼|½|¾|\d+\/\d+|\d+)\s*(?<val>.*)$/);


function collectNamedCaptureGroupData(collector, item) {
  item = item.trim();

  const { regXPrimary, regXSecondary, defaultKey, list } = collector;
  const result = regXPrimary.exec(item) || regXSecondary.exec(item);

  list.push(
    (result && result.groups && Object.assign({}, result.groups))
    || { [defaultKey]: item }
  );
  return collector;
}


const ingredientList = [
  'Chicken breast',
  '1 Chopped Tomato',
  '1/2 Cup yogurt',
  '1 packet pasta',
  '½ tsp fine sea salt',
  '1/2 teaspoon heavy cream',
  '2 ounces paprika',
  '2 fl. ounces paprika',
  'Ground ginger'
];

console.log(
  ingredientList.reduce(collectNamedCaptureGroupData, {

    regXPrimary: createUnitCentricCapturingRegX(measuringUnitList),
    regXSecondary: unitlessCapturingRegX,
    defaultKey: 'val',
    list: []

  }).list
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
<script>
  //  see at StackOverflow ...
  //
  //  ... "How to escape regular expression special characters using javascript?"
  //
  //  [https://stackoverflow.com/questions/3115150/how-to-escape-regular-expression-special-characters-using-javascript/9310752#9310752]
  //
  function escapeRegExpSearchString(text) {
    // return text.replace(/[-[\]{}()*+?.,\\^$|#\\s]/g, '\\$&');
    // ... slightly changed ...
    return text
      .replace(/[-[\]{}()*+?.,\\^$|#]/g, '\\$&')
      .replace((/\s+/), '\\s+');
  }
</script>


来源:https://stackoverflow.com/questions/63880334/how-does-one-parse-best-each-item-of-an-ingredient-list-and-does-create-a-new-ob

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!