How to handle user input of invalid UTF-8 characters?

后端 未结 9 2036
小鲜肉
小鲜肉 2020-11-29 17:26

I\'m looking for general a strategy/advice on how to handle invalid UTF-8 input from users.

Even though my webapp uses UTF-8, somehow some users enter invalid chara

9条回答
  •  鱼传尺愫
    2020-11-29 18:23

    I recommend merely not allowing garbage to get in. Don't rely on custom functions, which can bog your system down. Simply walk the submitted data against an alphabet you design. Create an acceptable alphabet string and walk the submitted data, byte by byte, as if it were an array. Push acceptable characters to a new string, and omit unacceptable characters. The data you store in your database then is data triggered by the user, but not actually user-supplied data.

    EDIT #4: Replacing bad character with entiy: �

    EDIT #3: Updated : Sept 22 2010 @ 1:32pm Reason: Now string returned is UTF-8, plus I used the test file you provided as proof.

    $val){
    //  print ord($val);
    //  print '
    '; // } // print '
    '; //*/ // // //test case #1 // // $str = 'afsjdfhasjhdgljhasdlfy42we875y342q8957y2wkjrgSAHKDJgfcv kzXnxbnSXbcv '.chr(160).chr(127).chr(126); // // $string = teststr($alpha,$str); // print $string; // print '
    '; // // //test case #2 // // $str = ''.'©?™???'; // $string = teststr($alpha,$str); // print $string; // print '
    '; // // $str = '©'; // $string = teststr($alpha,$str); // print $string; // print '
    '; $file = 'http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt'; $testfile = implode(chr(10),file($file)); $string = teststr($alpha,$testfile); print $string; print '
    '; function teststr(&$alpha, &$str){ $strlen = strlen($str); $newstr = chr(0); //null $x = 0; if($strlen >= 2){ for ($i = 0; $i < $strlen; $i++) { $x++; if(in_array($str[$i],$alpha)){ // passed $newstr .= $str[$i]; }else{ // failed print 'Found out of scope character. (ASCII: '.ord($str[$i]).')'; print '
    '; $newstr .= '�'; } } }elseif($strlen <= 0){ // failed to qualify for test print 'Non-existent.'; }elseif($strlen === 1){ $x++; if(in_array($str,$alpha)){ // passed $newstr = $str; }else{ // failed print 'Total character failed to qualify.'; $newstr = '�'; } }else{ print 'Non-existent (scope).'; } if(mb_detect_encoding($newstr, "UTF-8") == "UTF-8"){ // skip }else{ $newstr = utf8_encode($newstr); } // test encoding: if(mb_detect_encoding($newstr, "UTF-8")=="UTF-8"){ print 'UTF-8 :D
    '; }else{ print 'ENCODED: '.mb_detect_encoding($newstr, "UTF-8").'
    '; } return $newstr.' (scope: '.$x.', '.$strlen.')'; }

提交回复
热议问题