exchanging values of a variable, by values of an array, but under condition

前端 未结 2 505
难免孤独
难免孤独 2021-01-14 11:32

I have a code that compares the output with the values of the array, and only terminates the operation with words in the array:

First code(just a example

2条回答
  •  半阙折子戏
    2021-01-14 12:08

    My previous method was incredibly inefficient. I didn't realize how much data you were processing, but if we are upwards of 4000 lines, then efficiency is vital (I think I my brain was stuck thinking about strtr() related processing based on your previous question(s)). This is my new/improved solution which I expect to leave my previous solution in the dust.

    Code: (Demo)

    $myVar="My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
    echo "$myVar\n";
    
    $myWords=array(
        array("is","é"),
        array("on","no"),
        array("that","aquela"),
        array("sister","irmã"), 
        array("my","minha"),
        array("myth","mito"),
        array("he","ele"),
        array("good","bom"),
        array("ace","perito"),
        array("i","eu")  // notice I must be lowercase
    );
    $translations=array_combine(array_column($myWords,0),array_column($myWords,1));  // or skip this step and just declare $myWords as key-value pairs
    
    // length sorting is not necessary
    // preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)
    
    $pattern='/\b(?>'.implode('|',array_keys($translations)).')\b/i';  // atomic group is slightly faster (no backtracking)
    /* echo $pattern;
       makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
       demo: https://regex101.com/r/DXTtDf/1
    */
    $translated=preg_replace_callback(
        $pattern,
        function($m)use($translations){  // bring $translations (lookup) array to function
            $encoding='UTF-8';  // default setting
            $key=mb_strtolower($m[0],$encoding);  // standardize keys' case for lookup accessibility
            if(ctype_lower($m[0])){ // treat as all lower
                return $translations[$m[0]];
            }elseif(mb_strlen($m[0],$encoding)>1 && ctype_upper($m[0])){  // treat as all uppercase
                return mb_strtoupper($translations[$key],$encoding);
            }else{  // treat as only first character uppercase
                return mb_strtoupper(mb_substr($translations[$key],0,1,$encoding),$encoding)  // uppercase first
                      .mb_substr($translations[$key],1,mb_strlen($translations[$key],$encoding)-1,$encoding);  // append remaining lowercase
            }
        },
        $myVar);
    
    echo $translated;
    

    Output:

    My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
    Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!
    

    This method:

    • does only 1 pass through $myVar, not 1 pass for every subarray of $myWords.
    • does not bother with sorting the lookup array ($myWords/$translations).
    • does not bother with regex escaping (preg_quote()) or making pattern components literal (\Q..\E) because only words are being translated.
    • uses word boundaries so that only complete word matches are replaced.
    • uses an atomic group as a micro-optimization which maintains accuracy while denying backtracking.
    • declares an $encoding value for stability / maintainability / re-usability.
    • matches with case-insensitivity but replaces with case-sensitivity ...if the English match is:
      1. All lowercase, so is the replacement
      2. All uppercase (and larger than a single character), so is the replacement
      3. Capitalized (only first character of multi-character string), so is the replacement

提交回复
热议问题