问题
I have some content that is generated by the Drupal CMS that contains strings like:
"... \n Proficient knowledge of \x3cstrong\x3emedical\x3c/strong\x3e terminology; typing skills at 40 wpm. Excellent communication and ... which involves access to sensitive and/or confidential \x3cstrong\x3emedical\x3c/strong\x3e information. Must demonstrate leadership skills in decision making and ..."
I'm trying to transfer this data as json, but it doesn't validate. I think that's because the characters like \x3c may need to be in the longer 6 character format (see http://tools.ietf.org/html/rfc4627, section 2.5. - Strings).
Is this actually the problem? And if so, is there a good way to convert the format?
EDIT: here are two full samples of the json that are failing validation
{ "results": [ { "link": "http://dev.careersatnmc.org/content/clinical-information-clerk-patient-financial-services-11-12", "type": "", "title": "Clinical Information Clerk, Patient Financial Services, #11-12", "user": "", "date": "1337699702", "node": "", "extra": "", "score": 1.9532660466727E+25, "snippet": "... \n Proficient knowledge of \x3cstrong\x3emedical\x3c/strong\x3e terminology; typing skills at 40 wpm. Excellent communication and ... which involves access to sensitive and/or confidential \x3cstrong\x3emedical\x3c/strong\x3e information. Must demonstrate leadership skills in decision making and ..." }, { "link": "http://dev.careersatnmc.org/content/medical-assistant-northwestern-walk-clinic-11-44", "type": "", "title": "Medical Assistant, Northwestern Walk-In Clinic, #11-44", "user": "", "date": "1334178982", "node": "", "extra": "", "score": 1.6696042412062E+25, "snippet": "... \n Yes \n \n \n The \x3cstrong\x3eMedical\x3c/strong\x3e Assistant performs patient screening care under the direction of the \x3cstrong\x3eMedical\x3c/strong\x3e Director/On-site provider including, but not limited to, EKG’s. ..." }, { "link": "http://dev.careersatnmc.org/nursing-jobs", "type": "", "title": "Nursing Opportunities at Northwestern", "user": "", "date": "1333132723", "node": "", "extra": "", "score": 1.5935361158907E+25, "snippet": "... environment for caregivers. Here at Northwestern \x3cstrong\x3eMedical\x3c/strong\x3e Center, in addition to being a destination of choice for patients, we ..." }, { "link": "http://dev.careersatnmc.org/nursing-careers/rn/registered-nurse-float-pool-11-106", "type": "", "title": "Registered Nurse, Float Pool #11-106 ", "user": "", "date": "1333040298", "node": "", "extra": "", "score": 1.5869853268872E+25, "snippet": "... safe nursing care in a timely manner to patients on the \x3cstrong\x3eMedical\x3c/strong\x3e Surgical Unit and Intensive Care Units with a high degree of ... Float Pool RN will be required to rotate to both ICU and \x3cstrong\x3eMedical\x3c/strong\x3e Surgical Units based on patient census and staffing need. These ..." }, { "link": "http://dev.careersatnmc.org/content/medical-assistant-northwestern-walk-clinic-11-68", "type": "", "title": "Medical Assistant, Northwestern Walk-In Clinic, #11-68", "user": "", "date": "1327941682", "node": "", "extra": "", "score": 1.2643954777586E+25, "snippet": "... \n Yes \n \n \n The \x3cstrong\x3eMedical\x3c/strong\x3e Assistant performs patient screening care under the direction of the \x3cstrong\x3eMedical\x3c/strong\x3e Director/On-site provider including, but not limited to, EKG’s. ..." }, { "link": "http://dev.careersatnmc.org/content/clinical-support-associate-diagnostic-imaging-10-126", "type": "", "title": "Clinical Support Associate, Diagnostic Imaging, #10-126", "user": "", "date": "1327936594", "node": "", "extra": "", "score": 1.2641087846662E+25, "snippet": "... \n Three years experience in a \x3cstrong\x3emedical\x3c/strong\x3e office required. Prior clerical work experience in a \x3cstrong\x3emedical\x3c/strong\x3e office, knowledge of \x3cstrong\x3emedical\x3c/strong\x3e terminology, typing skills required. ..." }, { "link": "http://dev.careersatnmc.org/content/licensed-practical-nurse-cardiology-11-61", "type": "", "title": "Licensed Practical Nurse, Cardiology, #11-61", "user": "", "date": "1327443988", "node": "", "extra": "", "score": 1.2366575548271E+25, "snippet": "... \n Previous experience with electronic \x3cstrong\x3emedical\x3c/strong\x3e records preferred. \n \n \n \n Special Skills / ..." }, { "link": "http://dev.careersatnmc.org/equal-opportunity-policy", "type": "", "title": "Equal Opportunity", "user": "", "date": "1319564835", "node": "", "extra": "", "score": 8.704398538793E+24, "snippet": " Northwestern \x3cstrong\x3eMedical\x3c/strong\x3e Center is an equal opportunity employer that is committed to fair and ..." }, { "link": "http://dev.careersatnmc.org/NMC-Hospital-Video", "type": "", "title": "NMC Hospital Video", "user": "", "date": "1317216552", "node": "", "extra": "", "score": 7.8394368227485E+24, "snippet": "... more about what it\x26#39;s like to work at Northwestern \x3cstrong\x3eMedical\x3c/strong\x3e Center from some of the hospital\x26#39;s providers. \x26nbsp; \n \n ..." }, { "link": "http://dev.careersatnmc.org/overview", "type": "", "title": "About NMC", "user": "", "date": "1305051468", "node": "", "extra": "", "score": 4.5584239764666E+24, "snippet": "... environment for caregivers.\x26nbsp; Here at Northwestern \x3cstrong\x3eMedical\x3c/strong\x3e Center, in addition to being a destination of choice for patients, we ..." } ], "total": "36" }
{ "results": [ { "link": "http://dev.northwesternmedicalcenter.org/courtyard-cafe", "type": "", "title": "The Courtyard Café", "user": "", "date": "1341844260", "node": "", "extra": "", "score": 0.54264448532277, "snippet": " Meals \u0026amp; Snacks \n The NMC Courtyard Café serves a wide variety of options, whether you need a full meal or just a snack or drink.\u0026nbsp; There are always healthy options available to choose from in the Courtyard Café during hours of operation. \n T ..." }, { "link": "http://dev.northwesternmedicalcenter.org/overview", "type": "", "title": "Welcome to Northwestern Medical Center!", "user": "", "date": "1308682802", "node": "", "extra": "", "score": 0.54083665338769, "snippet": " Northwestern \u003cstrong\u003eMedical\u003c/strong\u003e Center is a\u0026nbsp;vibrant, not-for-profit, primary care hospital nestled ... we pride ourselves on bringing a broad range of high-tech \u003cstrong\u003emedical\u003c/strong\u003e equipment \u0026amp; services to our region. Thanks to that balance and the ..." }, { "link": "http://dev.northwesternmedicalcenter.org/stories-nmc", "type": "", "title": "Stories at NMC", "user": "", "date": "1340734687", "node": "", "extra": "", "score": 0.51676585442723, "snippet": "... Birth Center nurses and the IT folks at Northwestern \u003cstrong\u003eMedical\u003c/strong\u003e Center, to experience the birth of his daughter Payton while on duty in ..." }, { "link": "http://dev.northwesternmedicalcenter.org/medical-executive-committee", "type": "", "title": "Medical Executive Committee", "user": "", "date": "1306856292", "node": "", "extra": "", "score": 0.41599960274235, "snippet": " \u003cstrong\u003eMedical\u003c/strong\u003e Executive Committee \n The NMC \u003cstrong\u003eMedical\u003c/strong\u003e Staff is made up of more than 75 active staff physicians and more than 200 other physicians, dentists, and \u003cstrong\u003emedical\u003c/strong\u003e providers who have privileges at the hospital.\u0026nbsp; The \u003cstrong\u003eMedical\u003c/strong\u003e Staff ..." }, { "link": "http://dev.northwesternmedicalcenter.org/medical-cardiology", "type": "", "title": "Medical Cardiology", "user": "", "date": "1327606268", "node": "", "extra": "", "score": 0.40720084861885, "snippet": " ..." }, { "link": "http://dev.northwesternmedicalcenter.org/news-and-updates/dr-lowrey-sullivan-named-chief-medical-officer", "type": "", "title": "Dr. Lowrey Sullivan Named Chief Medical Officer", "user": "", "date": "1326989520", "node": "", "extra": "", "score": 0.40509813494658, "snippet": "... that Dr. Sullivan has accepted the position of Chief \u003cstrong\u003eMedical\u003c/strong\u003e Officer,\u0026rdquo; said Jill Bowen, NMC\u0026rsquo;s Chief Executive ... Having a physician who already has the respect of our \u003cstrong\u003emedical\u003c/strong\u003e staff provides a strong foundation for the success of this ... his Bachelors degree from Middlebury College and his \u003cstrong\u003eMedical\u003c/strong\u003e Degree from the University of Vermont.\u0026nbsp; He did his Internship and ..." }, { "link": "http://dev.northwesternmedicalcenter.org/nmc.overview-video", "type": "", "title": "NMC Overview Video", "user": "", "date": "1327331110", "node": "", "extra": "", "score": 0.33907030714933, "snippet": " View the video below to learn more about St. Alban\u0026rsquo;s lifestyle offerings. The city has much to offer and its central location between Burlington and Montreal makes it a great place to enjoy the pace and intimacy of a small town with access to bi ..." }, { "link": "http://dev.northwesternmedicalcenter.org/nmc-overview-video", "type": "", "title": "NMC Overview Video", "user": "", "date": "1327331316", "node": "", "extra": "", "score": 0.33905170147781, "snippet": " View the video below to learn more about St. Alban\u0026rsquo;s lifestyle offerings. The city has much to offer and its central location between Burlington and Montreal makes it a great place to enjoy the pace and intimacy of a small town with access to bi ..." }, { "link": "http://dev.northwesternmedicalcenter.org/news-and-updates/test-story", "type": "", "title": "Test Story", "user": "", "date": "1326989380", "node": "", "extra": "", "score": 0.33538503005686, "snippet": " Story Details \n Full Story:\u0026nbsp; \n \n \n Wolf cred veniam sunt. Nesciunt PBR four loko blog american apparel labore. Sint reprehenderit american apparel nihil, mcsweeney\u0026#39;s freegan voluptate velit al ..." }, { "link": "http://dev.northwesternmedicalcenter.org/news-and-updates/nmc-laboratory-featured-video", "type": "", "title": "NMC Laboratory Featured in a Video", "user": "", "date": "1326989494", "node": "", "extra": "", "score": 0.33522577107044, "snippet": " Story Details \n Full Story:\u0026nbsp; \n \n \n This electronic approach, which is being used as a model throughout the state, is quicker, more efficient, more accurate, and less costly way of sharing informat ..." } ], "total": "236" }
回答1:
\x usually represents hexadecimal, while \u is for unicode. Your question has nothing to do with Unicode or unicode codepoints.
It is safe to use chr() because \xFF is 255 max and that is in ASCII range.
function weird_answer_to_weird_question($string)
{
return preg_replace_callback('#\\\\x([[:xdigit:]]{2})#ism', function($matches)
{
return chr(hexdec($matches[1]));
},
$string);
}
Output:
"... \n Proficient knowledge of medical terminology; typing skills at 40 wpm. Excellent communication and ... which involves access to sensitive and/or confidential medical information. Must demonstrate leadership skills in decision making and ..."
P.S.
You must also do a $string = str_replace('\n', "\n", $string); or similar because json_encode() will double encode that. Thanks to @netcoder for pointing it out.
回答2:
what about :
echo iconv('ASCII', 'UTF-8', "Proficient knowledge of \x3cstrong\x3emedical\x3c/strong\x3e terminology");
// returns Proficient knowledge of <strong>medical</strong> terminology
$jsonString = "... \n Yes \n \n \n The \x3cstrong\x3eMedical\x3c/strong\x3e Assistant performs patient screening care under the direction of the \x3cstrong\x3eMedical\x3c/strong\x3e Director/On-site provider including, but not limited to, EKG’s. ...";
$jsonString = str_replace(array('’'), array("'"), $jsonString);
echo iconv('ASCII', 'UTF8//IGNORE//TRANSLIT', nl2br($jsonString));
// returns ... <br>Yes <br><br><br>The <strong>Medical</strong> Assistant performs patient screening care under the direction of the <strong>Medical</strong> Director/On-site provider including, but not limited to, EKG's. ...
回答3:
Ok, this will do it:
/**
* Converts all UTF-8 Units ( \xXX ) back into ascii characters.
*
* @param string $input String which includes some UTF-8 units
* @return string
*/
function convertUTF8Units($input) {
include $path;
$part = "";
$output = $input;
$len = strlen($input)-4;
for($i=0; $i<=$len; $i++) {
$part = substr($input, $i, 4);
if ((substr($part, 0, 2) === "\\x")) {
$raw = hex2bin( $part );
$raw = trim($raw);
$pattern = "/\\".$part."/";
$output = preg_replace($pattern, $raw, $output);
}
}
return $output;
}
/**
* Function to convert a hex code back to ascii string. Taken from
* http://devcorner.georgievi.net/pages/programming/php/hex2bin-php.
*
* @param string $hex_string String of format: \xXX
* @return string
*/
define('HEX2BIN_WS', " \t\n\r");
function hex2bin($hex_string) {
$pos = 0;
$result = '';
while ($pos < strlen($hex_string)) {
if (strpos(HEX2BIN_WS, $hex_string{$pos}) !== FALSE) {
$pos++;
}
else {
$code = hexdec(substr($hex_string, $pos, 2));
$pos = $pos + 2;
$result .= chr($code);
}
}
return $result;
}
I'm a little fuzzy on exactly what I'm converting to what though; all I'm sure about is that it passes all the JSON validators now. While pursuing this UTF-8, UTF-8 Units, Binary somethings, Hex values and ascii characters have all come up. I can't actually articulate the difference, nor can I definitively say what the input, conversions, or output of these functions are.
Can anyone walk me through what my code is doing? :P
来源:https://stackoverflow.com/questions/11692110/how-to-convert-hex-codes-in-json-data-using-php