问题
I have a string which looks like this
someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left?"
lookupDict = {"Hello there": "#3", "candies": "#4"}
Now I want to replace every terms in the string someString
with #0
which are not in the dictionary lookupDict
. I can't split by a space " "
since this will make certain terms like Hello there
appear as two different words Hello
and there
and that would never match my condition.
Now I know to apply basic regex that would add a #0
in front of every word. For example something like
let regex = /(\b\w+\b)/g;
someString = someString.replace(regex, '#0$1'));
But that would blindly add #0
to every term and won't lookup in the dictionary lookupDict
.
Is there any way I can combine the regex with a lookup in the dictionary and assign the #0
accordingly? Basically the end result would something like
someString = "#3Hello there! #0How #0many #4candies #0did #0you #0sell #0today? #0Do #0have #0any #4candies #0left?"
Note: Spaces can be considered as word boundries here.
回答1:
You may use the following logic:
- Build an array of substrings you need to skip that are concatenated
value
s andkey
s of the associative array - Sort the items by length in the descending order since the word boundaries might not work well with phrases containing whitespace
- Compile a regex pattern that will consist of two alternatives: the first will match the array items (escaped for use in a regex pattern) enclosed with a capturing group, and the other will match the rest of the "words"
- When a match is found, check if Group 1 matched. If group 1 matches, just return the match value, else, add
#0
to the match value.
Here is the implementation:
let someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left? #0how #0much";
const lookupDict = {"Hello there": "#3", "candies": "#4", "how": "#0", "much": "#0"};
let patternDict = []; // Substrings to skip
for (var key in lookupDict) {
patternDict.push( `${lookupDict[key]}${key}` ); // Values + keys
}
patternDict.sort(function(a, b){ // Sorting by length, descending
return b.length - a.length;
});
var rx = new RegExp("(?:^|\\W)(" + patternDict.map(function(m) { // Building the final pattern
return m.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');}
).join("|") + ")(?!\\w)|\\S+", "gi");
// rx = /(?:^|\W)(#3Hello there|#4candies|#0much|#0how)(?!\w)|\S+/gi
someString = someString.replace(rx, (x, y) => y ? x : `#0${x}` );
console.log(someString);
// => #3Hello there! #0How #0many #4candies #0did #0you #0sell #0today? #0Do #0have #0any #4candies #0left? #0how #0much
The regex will look like
/(?:^|\W)(#3Hello there|#4candies|#0much|#0how)(?!\w)|\S+/gi
See the regex demo (PHP option chosen to highlight groups green).
Details
(?:^|\W)
- a non-capturing group matching either start of string (^
) or (|
) any non-word char (=a char other than an ASCII letter, digit or_
)(#3Hello there|#4candies|#0much|#0how)
- Capturing group 1 matching any of thelookupDict
concatenated value+keys(?!\w)
- a negative lookahead that fails the match if, immediately to the right of the current location, there is a word char|
- or\S+
- 1+ non-whitespace chars.
回答2:
With this way, there is no worry for lookupDict key length or anything else:
let someString =
"#3Hello there! How many #4candies did you sell today? #3Hello there! Do have any #4candies left?#3Hello there! #7John Doe! some other text with having #7John Doe person again";
const lookupDict = { "Hello there": "#3", candies: "#4", "John Doe": "#7" };
Object.keys(lookupDict).map((key, i) => {
const regex = new RegExp(key, "g");
someString = someString.replace(regex, lookupDict[key]); // replace each key to the value: Hello world => #3
});
someString = someString.replace(/ /gi, " #0"); // replace each space
Object.keys(lookupDict).map((key, i) => {
const regex = new RegExp(lookupDict[key] + lookupDict[key], "g");
someString = someString.replace(regex, `${lookupDict[key]}${key}`); // role back the value to key+value
});
someString = someString.replace(/#0#/gi, "#"); // replace #0 for each lookupDict key value
console.log(someString, '<TheResult/>');
回答3:
You can pass a function to .replace
as second parameter and check the matching token in dictionary
I've changed regex to not include #
in results
Hello there
is problematic, how long can a single term be? max 2 words?
someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left?"
let regex = /(?<!#)(\b\w+\b)/g;
someString = someString.replace(regex, x => {
// check x in dict
return `#0${x}`
});
console.log(someString)
来源:https://stackoverflow.com/questions/60365460/regex-to-add-string-to-beginning-of-every-word-based-on-condition