I have a code that compares the output with the values of the array, and only terminates the operation with words in the array:
First code(just a example
My previous method was incredibly inefficient. I didn't realize how much data you were processing, but if we are upwards of 4000 lines, then efficiency is vital (I think I my brain was stuck thinking about strtr() related processing based on your previous question(s)). This is my new/improved solution which I expect to leave my previous solution in the dust.
Code: (Demo)
$myVar="My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
echo "$myVar\n";
$myWords=array(
array("is","é"),
array("on","no"),
array("that","aquela"),
array("sister","irmã"),
array("my","minha"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito"),
array("i","eu") // notice I must be lowercase
);
$translations=array_combine(array_column($myWords,0),array_column($myWords,1)); // or skip this step and just declare $myWords as key-value pairs
// length sorting is not necessary
// preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)
$pattern='/\b(?>'.implode('|',array_keys($translations)).')\b/i'; // atomic group is slightly faster (no backtracking)
/* echo $pattern;
makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
demo: https://regex101.com/r/DXTtDf/1
*/
$translated=preg_replace_callback(
$pattern,
function($m)use($translations){ // bring $translations (lookup) array to function
$encoding='UTF-8'; // default setting
$key=mb_strtolower($m[0],$encoding); // standardize keys' case for lookup accessibility
if(ctype_lower($m[0])){ // treat as all lower
return $translations[$m[0]];
}elseif(mb_strlen($m[0],$encoding)>1 && ctype_upper($m[0])){ // treat as all uppercase
return mb_strtoupper($translations[$key],$encoding);
}else{ // treat as only first character uppercase
return mb_strtoupper(mb_substr($translations[$key],0,1,$encoding),$encoding) // uppercase first
.mb_substr($translations[$key],1,mb_strlen($translations[$key],$encoding)-1,$encoding); // append remaining lowercase
}
},
$myVar);
echo $translated;
Output:
My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!
This method:
$myVar, not 1 pass for every subarray of $myWords.$myWords/$translations).preg_quote()) or making pattern components literal (\Q..\E) because only words are being translated.$encoding value for stability / maintainability / re-usability.