问题
I am creating a RSS feed file for my application in which I want to remove HTML tags, which is done by strip_tags
. But strip_tags
is not removing HTML special code chars:
& ©
etc.
Please tell me any function which I can use to remove these special code chars from my string.
回答1:
Either decode them using html_entity_decode
or remove them using preg_replace
:
$Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content);
(From here)
EDIT: Alternative according to Jacco's comment
might be nice to replace the '+' with {2,8} or something. This will limit the chance of replacing entire sentences when an unencoded '&' is present.
$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content);
回答2:
In addition to the good answers above, PHP also has a built-in filter function that is quite useful: filter-var.
To remove HMTL characters, use:
$cleanString = filter_var($dirtyString, FILTER_SANITIZE_STRING);
More info:
- function.filter-var
- filter_sanitize_string
回答3:
You may want take a look at htmlentities() and html_entity_decode() here
$orig = "I'll \"walk\" the <b>dog</b> now";
$a = htmlentities($orig);
$b = html_entity_decode($a);
echo $a; // I'll "walk" the <b>dog</b> now
echo $b; // I'll "walk" the <b>dog</b> now
回答4:
This might work well to remove special characters.
$modifiedString = preg_replace("/[^a-zA-Z0-9_.-\s]/", "", $content);
回答5:
A plain vanilla strings way to do it without engaging the preg regex engine:
function remEntities($str) {
if(substr_count($str, '&') && substr_count($str, ';')) {
// Find amper
$amp_pos = strpos($str, '&');
//Find the ;
$semi_pos = strpos($str, ';');
// Only if the ; is after the &
if($semi_pos > $amp_pos) {
//is a HTML entity, try to remove
$tmp = substr($str, 0, $amp_pos);
$tmp = $tmp. substr($str, $semi_pos + 1, strlen($str));
$str = $tmp;
//Has another entity in it?
if(substr_count($str, '&') && substr_count($str, ';'))
$str = remEntities($tmp);
}
}
return $str;
}
回答6:
What I have done was to use: html_entity_decode
, then use strip_tags
to removed them.
回答7:
try this
<?php
$str = "\x8F!!!";
// Outputs an empty string
echo htmlentities($str, ENT_QUOTES, "UTF-8");
// Outputs "!!!"
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
?>
回答8:
It looks like what you really want is:
function xmlEntities($string) {
$translationTable = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($translationTable as $char => $entity) {
$from[] = $entity;
$to[] = '&#'.ord($char).';';
}
return str_replace($from, $to, $string);
}
It replaces the named-entities with their number-equivalent.
回答9:
<?php
function strip_only($str, $tags, $stripContent = false) {
$content = '';
if(!is_array($tags)) {
$tags = (strpos($str, '>') !== false
? explode('>', str_replace('<', '', $tags))
: array($tags));
if(end($tags) == '') array_pop($tags);
}
foreach($tags as $tag) {
if ($stripContent)
$content = '(.+</'.$tag.'[^>]*>|)';
$str = preg_replace('#</?'.$tag.'[^>]*>'.$content.'#is', '', $str);
}
return $str;
}
$str = '<font color="red">red</font> text';
$tags = 'font';
$a = strip_only($str, $tags); // red text
$b = strip_only($str, $tags, true); // text
?>
回答10:
The function I used to perform the task, joining the upgrade made by schnaader is:
mysql_real_escape_string(
preg_replace_callback("/&#?[a-z0-9]+;/i", function($m) {
return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES");
}, strip_tags($row['cuerpo'])))
This function removes every html tag and html symbol, converted in UTF-8 ready to save in MySQL
回答11:
If you want to convert the HTML special characters and not just remove them as well as strip things down and prepare for plain text this was the solution that worked for me...
function htmlToPlainText($str){
$str = str_replace(' ', ' ', $str);
$str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
$str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
$str = html_entity_decode($str);
$str = htmlspecialchars_decode($str);
$str = strip_tags($str);
return $str;
}
$string = '<p>this is ( ) a test</p>
<div>Yes this is! & does it get "processed"? </div>'
htmlToPlainText($string);
// "this is ( ) a test. Yes this is! & does it get processed?"`
html_entity_decode w/ ENT_QUOTES | ENT_XML1 converts things like '
htmlspecialchars_decode converts things like &
html_entity_decode converts things like '<
and strip_tags removes any HTML tags left over.
EDIT - Added str_replace(' ', ' ', $str); and several other html_entity_decode() as continued testing has shown a need for them.
回答12:
You can try htmlspecialchars_decode($string)
. It works for me.
http://www.w3schools.com/php/func_string_htmlspecialchars_decode.asp
回答13:
$string = "äáčé";
$convert = Array(
'ä'=>'a',
'Ä'=>'A',
'á'=>'a',
'Á'=>'A',
'à'=>'a',
'À'=>'A',
'ã'=>'a',
'Ã'=>'A',
'â'=>'a',
'Â'=>'A',
'č'=>'c',
'Č'=>'C',
'ć'=>'c',
'Ć'=>'C',
'ď'=>'d',
'Ď'=>'D',
'ě'=>'e',
'Ě'=>'E',
'é'=>'e',
'É'=>'E',
'ë'=>'e',
);
$string = strtr($string , $convert );
echo $string; //aace
来源:https://stackoverflow.com/questions/657643/how-to-remove-html-special-chars