How to convert hex codes in JSON data using PHP

后端 未结 3 1474
清歌不尽
清歌不尽 2020-12-22 07:52

I have some content that is generated by the Drupal CMS that contains strings like:

\"... \\n Proficient knowledge of \\x3cstrong\\x3emedical\\x3c/strong\\x3         


        
相关标签:
3条回答
  • 2020-12-22 08:07

    Ok, this will do it:

    /**
     * Converts all UTF-8 Units ( \xXX ) back into ascii characters.  
     * 
     * @param string $input   String which includes some UTF-8 units
     * @return string
     */
    function convertUTF8Units($input) {
        include $path;
        $part = "";
        $output = $input;
        $len = strlen($input)-4;
    
        for($i=0; $i<=$len; $i++) {
          $part = substr($input, $i, 4);
    
          if ((substr($part, 0, 2) === "\\x")) {          
            $raw = hex2bin( $part );
            $raw = trim($raw);
            $pattern = "/\\".$part."/";
            $output = preg_replace($pattern, $raw, $output);
          }
    
        }
        return $output;
    }
    
    /**
     * Function to convert a hex code back to ascii string.  Taken from 
     * http://devcorner.georgievi.net/pages/programming/php/hex2bin-php.
     * 
     * @param string $hex_string   String of format: \xXX
     * @return string
     */
    define('HEX2BIN_WS', " \t\n\r");
    function hex2bin($hex_string) {
      $pos = 0;
        $result = '';
        while ($pos < strlen($hex_string)) {
          if (strpos(HEX2BIN_WS, $hex_string{$pos}) !== FALSE) {
            $pos++;
          } 
        else {
            $code = hexdec(substr($hex_string, $pos, 2));
          $pos = $pos + 2;
            $result .= chr($code); 
          }
        }
        return $result;
    }
    

    I'm a little fuzzy on exactly what I'm converting to what though; all I'm sure about is that it passes all the JSON validators now. While pursuing this UTF-8, UTF-8 Units, Binary somethings, Hex values and ascii characters have all come up. I can't actually articulate the difference, nor can I definitively say what the input, conversions, or output of these functions are.

    Can anyone walk me through what my code is doing? :P

    0 讨论(0)
  • 2020-12-22 08:24

    what about :

    echo iconv('ASCII', 'UTF-8', "Proficient knowledge of \x3cstrong\x3emedical\x3c/strong\x3e terminology");
    // returns Proficient knowledge of <strong>medical</strong> terminology
    $jsonString = "... \n Yes \n \n \n The \x3cstrong\x3eMedical\x3c/strong\x3e Assistant performs patient screening care under the direction of the \x3cstrong\x3eMedical\x3c/strong\x3e Director/On-site provider including, but not limited to, EKG’s. ...";
    $jsonString = str_replace(array('’'), array("'"), $jsonString);
    echo iconv('ASCII', 'UTF8//IGNORE//TRANSLIT', nl2br($jsonString));
    // returns ... <br>Yes <br><br><br>The <strong>Medical</strong> Assistant performs patient screening care under the direction of the <strong>Medical</strong> Director/On-site provider including, but not limited to, EKG's. ...
    
    0 讨论(0)
  • 2020-12-22 08:32

    \x usually represents hexadecimal, while \u is for unicode. Your question has nothing to do with Unicode or unicode codepoints.

    It is safe to use chr() because \xFF is 255 max and that is in ASCII range.

    function weird_answer_to_weird_question($string)
    {
        return preg_replace_callback('#\\\\x([[:xdigit:]]{2})#ism', function($matches)
        {
            return chr(hexdec($matches[1]));
        },
        $string);
    }
    

    Output:

    "... \n Proficient knowledge of medical terminology; typing skills at 40 wpm. Excellent communication and ... which involves access to sensitive and/or confidential medical information. Must demonstrate leadership skills in decision making and ..."

    P.S.

    You must also do a $string = str_replace('\n', "\n", $string); or similar because json_encode() will double encode that. Thanks to @netcoder for pointing it out.

    0 讨论(0)
提交回复
热议问题