UTF-8 is de facto standard for web applications now, but PHP this is not a default encoding for PHP (until 6.0). Most of the server is set up for the ISO-8859-1 encoding by
Webserver may be configured to send inappropriate headers, so it's recommended to override them in application level. For instance:
header('Content-Type: text/html; charset=utf-8');
Add HTML meta content-type:
Use htmlspecialchars() instead of htmlentities() because the former is enough in utf-8 and the latter is incompatible with utf-8 by default.
For regular expressions use u modifier. For example:
preg_match('/ž{3,5}/u', $string, $matches);
Together this is the most reliable way to check if the given string is valid utf-8 string:
if (@preg_match('//u', $string) === false) {
// NOT valid!
} else {
// Valid!
}
If you use the database then always set appropriate connection encoding right after the connection is made. Example for MySQL:
mysql_set_charset('utf8', $link);
Also check if columns in the database are in utf-8. It's not always needed but recomended.