Regex to add string to beginning of every word based on condition

自古美人都是妖i 提交于 2021-01-28 11:23:36

问题


I have a string which looks like this

someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left?"

lookupDict = {"Hello there": "#3", "candies": "#4"}

Now I want to replace every terms in the string someString with #0 which are not in the dictionary lookupDict. I can't split by a space " " since this will make certain terms like Hello there appear as two different words Hello and there and that would never match my condition.

Now I know to apply basic regex that would add a #0 in front of every word. For example something like

let regex = /(\b\w+\b)/g;

someString = someString.replace(regex, '#0$1'));

But that would blindly add #0 to every term and won't lookup in the dictionary lookupDict.

Is there any way I can combine the regex with a lookup in the dictionary and assign the #0 accordingly? Basically the end result would something like

someString = "#3Hello there! #0How #0many #4candies #0did #0you #0sell #0today? #0Do #0have #0any #4candies #0left?"

Note: Spaces can be considered as word boundries here.


回答1:


You may use the following logic:

  • Build an array of substrings you need to skip that are concatenated values and keys of the associative array
  • Sort the items by length in the descending order since the word boundaries might not work well with phrases containing whitespace
  • Compile a regex pattern that will consist of two alternatives: the first will match the array items (escaped for use in a regex pattern) enclosed with a capturing group, and the other will match the rest of the "words"
  • When a match is found, check if Group 1 matched. If group 1 matches, just return the match value, else, add #0 to the match value.

Here is the implementation:

let someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left? #0how #0much";
const lookupDict = {"Hello there": "#3", "candies": "#4", "how": "#0", "much": "#0"};
let patternDict = [];                             // Substrings to skip
for (var key in lookupDict) {                     
  patternDict.push( `${lookupDict[key]}${key}` ); // Values + keys
}
patternDict.sort(function(a, b){                  // Sorting by length, descending
  return b.length - a.length;
});
var rx = new RegExp("(?:^|\\W)(" + patternDict.map(function(m) { // Building the final pattern
    return m.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');}
  ).join("|") + ")(?!\\w)|\\S+", "gi");
// rx = /(?:^|\W)(#3Hello there|#4candies|#0much|#0how)(?!\w)|\S+/gi
someString = someString.replace(rx, (x, y) => y ? x : `#0${x}` );
console.log(someString);
// => #3Hello there! #0How #0many #4candies #0did #0you #0sell #0today? #0Do #0have #0any #4candies #0left? #0how #0much

The regex will look like

/(?:^|\W)(#3Hello there|#4candies|#0much|#0how)(?!\w)|\S+/gi

See the regex demo (PHP option chosen to highlight groups green).

Details

  • (?:^|\W) - a non-capturing group matching either start of string (^) or (|) any non-word char (=a char other than an ASCII letter, digit or _)
  • (#3Hello there|#4candies|#0much|#0how) - Capturing group 1 matching any of the lookupDict concatenated value+keys
  • (?!\w) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a word char
  • | - or
  • \S+ - 1+ non-whitespace chars.



回答2:


With this way, there is no worry for lookupDict key length or anything else:

let someString =
  "#3Hello there! How many #4candies did you sell today? #3Hello there! Do have any #4candies left?#3Hello there! #7John Doe! some other text with having #7John Doe person again";

const lookupDict = { "Hello there": "#3", candies: "#4", "John Doe": "#7" };

Object.keys(lookupDict).map((key, i) => {
  const regex = new RegExp(key, "g");
  someString = someString.replace(regex, lookupDict[key]); // replace each key to the value: Hello world => #3
});

someString = someString.replace(/ /gi, " #0"); // replace each space

Object.keys(lookupDict).map((key, i) => {
  const regex = new RegExp(lookupDict[key] + lookupDict[key], "g");
  someString = someString.replace(regex, `${lookupDict[key]}${key}`); // role back the value to key+value
});

someString = someString.replace(/#0#/gi, "#"); // replace #0 for each lookupDict key value

console.log(someString, '<TheResult/>');



回答3:


You can pass a function to .replace as second parameter and check the matching token in dictionary

I've changed regex to not include # in results

Hello there is problematic, how long can a single term be? max 2 words?

someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left?"

let regex = /(?<!#)(\b\w+\b)/g;

someString = someString.replace(regex, x => {
// check x in dict
	return `#0${x}`
});
console.log(someString)


来源:https://stackoverflow.com/questions/60365460/regex-to-add-string-to-beginning-of-every-word-based-on-condition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!