preg_match unicode parsing
问题 I want to match a sub set of unicode/UTF-8 chars, (marked in yellow here http://solomon.ie/unicode/), from my research I came up with this: // ensure it's valid unicode / get rid of invalid UTF8 chars $text = iconv("UTF-8","UTF-8//IGNORE",$text); // and just allow a basic english...ish.. chars through - no controls, chinese etc $match_list = "\x{09}\x{0a}\x{0d}\x{20}-\x{7e}"; // basic ascii chars plus CR,LF and TAB $match_list .= "\x{a1}-\x{ff}"; // extended latin 1 chars excluding control