I have some content that is generated by the Drupal CMS that contains strings like:
\"... \\n Proficient knowledge of \\x3cstrong\\x3emedical\\x3c/strong\\x3
Ok, this will do it:
/**
* Converts all UTF-8 Units ( \xXX ) back into ascii characters.
*
* @param string $input String which includes some UTF-8 units
* @return string
*/
function convertUTF8Units($input) {
include $path;
$part = "";
$output = $input;
$len = strlen($input)-4;
for($i=0; $i<=$len; $i++) {
$part = substr($input, $i, 4);
if ((substr($part, 0, 2) === "\\x")) {
$raw = hex2bin( $part );
$raw = trim($raw);
$pattern = "/\\".$part."/";
$output = preg_replace($pattern, $raw, $output);
}
}
return $output;
}
/**
* Function to convert a hex code back to ascii string. Taken from
* http://devcorner.georgievi.net/pages/programming/php/hex2bin-php.
*
* @param string $hex_string String of format: \xXX
* @return string
*/
define('HEX2BIN_WS', " \t\n\r");
function hex2bin($hex_string) {
$pos = 0;
$result = '';
while ($pos < strlen($hex_string)) {
if (strpos(HEX2BIN_WS, $hex_string{$pos}) !== FALSE) {
$pos++;
}
else {
$code = hexdec(substr($hex_string, $pos, 2));
$pos = $pos + 2;
$result .= chr($code);
}
}
return $result;
}
I'm a little fuzzy on exactly what I'm converting to what though; all I'm sure about is that it passes all the JSON validators now. While pursuing this UTF-8, UTF-8 Units, Binary somethings, Hex values and ascii characters have all come up. I can't actually articulate the difference, nor can I definitively say what the input, conversions, or output of these functions are.
Can anyone walk me through what my code is doing? :P