I have a form with a textarea. Users enter a block of text which is stored in a database.
Occasionally a user will paste text from Word containing smart quotes or em
If you were looking to escape these characters for the web while preserving their appearance, so your strings will appear like this: “It’s nice!” rather than "It's boring"...
You can do this by using your own custom htmlEncode function in place of PHP's htmlentities():
$trans_tbl = false;
function htmlEncode($text) {
global $trans_tbl;
// create translation table once
if(!$trans_tbl) {
// start with the default set of conversions and add more.
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl[chr(130)] = '‚'; // Single Low-9 Quotation Mark
$trans_tbl[chr(131)] = 'ƒ'; // Latin Small Letter F With Hook
$trans_tbl[chr(132)] = '„'; // Double Low-9 Quotation Mark
$trans_tbl[chr(133)] = '…'; // Horizontal Ellipsis
$trans_tbl[chr(134)] = '†'; // Dagger
$trans_tbl[chr(135)] = '‡'; // Double Dagger
$trans_tbl[chr(136)] = 'ˆ'; // Modifier Letter Circumflex Accent
$trans_tbl[chr(137)] = '‰'; // Per Mille Sign
$trans_tbl[chr(138)] = 'Š'; // Latin Capital Letter S With Caron
$trans_tbl[chr(139)] = '‹'; // Single Left-Pointing Angle Quotation Mark
$trans_tbl[chr(140)] = 'Œ'; // Latin Capital Ligature OE
// smart single/ double quotes (from MS)
$trans_tbl[chr(145)] = '‘';
$trans_tbl[chr(146)] = '’';
$trans_tbl[chr(147)] = '“';
$trans_tbl[chr(148)] = '”';
$trans_tbl[chr(149)] = '•'; // Bullet
$trans_tbl[chr(150)] = '–'; // En Dash
$trans_tbl[chr(151)] = '—'; // Em Dash
$trans_tbl[chr(152)] = '˜'; // Small Tilde
$trans_tbl[chr(153)] = '™'; // Trade Mark Sign
$trans_tbl[chr(154)] = 'š'; // Latin Small Letter S With Caron
$trans_tbl[chr(155)] = '›'; // Single Right-Pointing Angle Quotation Mark
$trans_tbl[chr(156)] = 'œ'; // Latin Small Ligature OE
$trans_tbl[chr(159)] = 'Ÿ'; // Latin Capital Letter Y With Diaeresis
ksort($trans_tbl);
}
// escape HTML
return strtr($text, $trans_tbl);
}