问题
My function now only works with one word. For example, I have words in an associative array. And my function replaces the array key with its value in the text. And the function to keep the words in the lower case, but when you replace words it will return the incoming word register that is written on the text. Now the function can not work with pairs of words to replace pairs of words with other pairs of words.
Example:
// Function:
function replaceKeyToValue($request, $dict){
$response = preg_replace_callback("/\pL+/u", function ($m) use ($dict) {
$word = mb_strtolower($m[0]);
if (isset($dict[$word])) {
$repl = $dict[$word];
// Check for some common ways of upper/lower case
// 1. all lower case
if ($word === $m[0]) return $repl;
// 2. all upper case
if (mb_strtoupper($word) === $m[0]) return mb_strtoupper($repl);
// 3. Only first letters are upper case
if (mb_convert_case($word, MB_CASE_TITLE) === $m[0]) return mb_convert_case($repl, MB_CASE_TITLE);
// Otherwise: check each character whether it should be upper or lower case
for ($i = 0, $len = mb_strlen($word); $i < $len; ++$i) {
$mixed[] = mb_substr($word, $i, 1) === mb_substr($m[0], $i, 1)
? mb_substr($repl, $i, 1)
: mb_strtoupper(mb_substr($repl, $i, 1));
}
return implode("", $mixed);
}
return $m[0]; // Nothing changes
}, $request);
return $response;
}
// Example associative array
$dict = array
(
"make"=>"take",
"cool"=>"pool",
"узбек"=>"ӯзбек",
);
$text = 'Make COOL узБЕК';
echo replaceKeyToValue($text, $dict);
Output:
Take POOL ӯзБЕК
How will the function be redone so that it can also pair words into pair words?
Example array with pairs words:
$array = array
(
"take pool" => "pool take",
"get book" => "set word",
"узбек точик" => "ӯзбек тоҷик"
);
$example_text = "Take POOL Get BooK УзБеК ТоЧИК";
回答1:
First thing: push your case transformation out of the problem and write a dedicated function to handle it.
About the word pairs: You can solve the problem using:
- a lookahead with an optional subpattern to capture a second word
- a static boolean variable (defined in the callback function) to know if the previous match was the first word of an existing two words substring.
You only need this pattern:
~\b\pL+\b(?=( \pL+\b)?)~u
The lookahead allows to walk the string at each start of word (even at the end of the string since (?=( \pL+\b)?)
is an always true assertion.) since it doesn't consume any character.
It's very simple:
- the boolean variable is set to
false
at the beginning. - when the boolean is false and
$m[0].$m[1]
in lowercase exists in the dict keys, then set the boolean totrue
and return the dict value, else return$m[0]
- when the boolean is true, set it to
false
and return an empty string
Advantage: You don't have to take care about the dict size. Using the same idea, you can even extend the algorithm to more words with little changes or handle a dict in which items keys have a different number of words.
Advice: when you think to change the backtracking limit or to build a giant alternation, don't do it. It only means your approach isn't the good one.
来源:https://stackoverflow.com/questions/46978603/how-to-make-the-function-work-for-pair-words