How to remove html special chars?

穿精又带淫゛_ 提交于 2019-11-26 04:12:21

问题


I am creating a RSS feed file for my application in which I want to remove HTML tags, which is done by strip_tags. But strip_tags is not removing HTML special code chars:

  & © 

etc.

Please tell me any function which I can use to remove these special code chars from my string.


回答1:


Either decode them using html_entity_decode or remove them using preg_replace:

$Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content); 

(From here)

EDIT: Alternative according to Jacco's comment

might be nice to replace the '+' with {2,8} or something. This will limit the chance of replacing entire sentences when an unencoded '&' is present.

$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content); 



回答2:


In addition to the good answers above, PHP also has a built-in filter function that is quite useful: filter-var.

To remove HMTL characters, use:

$cleanString = filter_var($dirtyString, FILTER_SANITIZE_STRING);

More info:

  1. function.filter-var
  2. filter_sanitize_string



回答3:


You may want take a look at htmlentities() and html_entity_decode() here

$orig = "I'll \"walk\" the <b>dog</b> now";

$a = htmlentities($orig);

$b = html_entity_decode($a);

echo $a; // I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now

echo $b; // I'll "walk" the <b>dog</b> now



回答4:


This might work well to remove special characters.

$modifiedString = preg_replace("/[^a-zA-Z0-9_.-\s]/", "", $content); 



回答5:


A plain vanilla strings way to do it without engaging the preg regex engine:

function remEntities($str) {
  if(substr_count($str, '&') && substr_count($str, ';')) {
    // Find amper
    $amp_pos = strpos($str, '&');
    //Find the ;
    $semi_pos = strpos($str, ';');
    // Only if the ; is after the &
    if($semi_pos > $amp_pos) {
      //is a HTML entity, try to remove
      $tmp = substr($str, 0, $amp_pos);
      $tmp = $tmp. substr($str, $semi_pos + 1, strlen($str));
      $str = $tmp;
      //Has another entity in it?
      if(substr_count($str, '&') && substr_count($str, ';'))
        $str = remEntities($tmp);
    }
  }
  return $str;
}



回答6:


What I have done was to use: html_entity_decode, then use strip_tags to removed them.




回答7:


try this

<?php
$str = "\x8F!!!";

// Outputs an empty string
echo htmlentities($str, ENT_QUOTES, "UTF-8");

// Outputs "!!!"
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
?>



回答8:


It looks like what you really want is:

function xmlEntities($string) {
    $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);

    foreach ($translationTable as $char => $entity) {
        $from[] = $entity;
        $to[] = '&#'.ord($char).';';
    }
    return str_replace($from, $to, $string);
}

It replaces the named-entities with their number-equivalent.




回答9:


<?php
function strip_only($str, $tags, $stripContent = false) {
    $content = '';
    if(!is_array($tags)) {
        $tags = (strpos($str, '>') !== false
                 ? explode('>', str_replace('<', '', $tags))
                 : array($tags));
        if(end($tags) == '') array_pop($tags);
    }
    foreach($tags as $tag) {
        if ($stripContent)
             $content = '(.+</'.$tag.'[^>]*>|)';
         $str = preg_replace('#</?'.$tag.'[^>]*>'.$content.'#is', '', $str);
    }
    return $str;
}

$str = '<font color="red">red</font> text';
$tags = 'font';
$a = strip_only($str, $tags); // red text
$b = strip_only($str, $tags, true); // text
?> 



回答10:


The function I used to perform the task, joining the upgrade made by schnaader is:

    mysql_real_escape_string(
        preg_replace_callback("/&#?[a-z0-9]+;/i", function($m) { 
            return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); 
        }, strip_tags($row['cuerpo'])))

This function removes every html tag and html symbol, converted in UTF-8 ready to save in MySQL




回答11:


If you want to convert the HTML special characters and not just remove them as well as strip things down and prepare for plain text this was the solution that worked for me...

function htmlToPlainText($str){
    $str = str_replace('&nbsp;', ' ', $str);
    $str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
    $str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
    $str = html_entity_decode($str);
    $str = htmlspecialchars_decode($str);
    $str = strip_tags($str);

    return $str;
}

$string = '<p>this is (&nbsp;) a test</p>
<div>Yes this is! &amp; does it get "processed"? </div>'

htmlToPlainText($string);
// "this is ( ) a test. Yes this is! & does it get processed?"`

html_entity_decode w/ ENT_QUOTES | ENT_XML1 converts things like &#39; htmlspecialchars_decode converts things like &amp; html_entity_decode converts things like '&lt; and strip_tags removes any HTML tags left over.

EDIT - Added str_replace(' ', ' ', $str); and several other html_entity_decode() as continued testing has shown a need for them.




回答12:


You can try htmlspecialchars_decode($string). It works for me.

http://www.w3schools.com/php/func_string_htmlspecialchars_decode.asp




回答13:


$string = "äáčé";

$convert = Array(
        'ä'=>'a',
        'Ä'=>'A',
        'á'=>'a',
        'Á'=>'A',
        'à'=>'a',
        'À'=>'A',
        'ã'=>'a',
        'Ã'=>'A',
        'â'=>'a',
        'Â'=>'A',
        'č'=>'c',
        'Č'=>'C',
        'ć'=>'c',
        'Ć'=>'C',
        'ď'=>'d',
        'Ď'=>'D',
        'ě'=>'e',
        'Ě'=>'E',
        'é'=>'e',
        'É'=>'E',
        'ë'=>'e',
    );

$string = strtr($string , $convert );

echo $string; //aace


来源:https://stackoverflow.com/questions/657643/how-to-remove-html-special-chars

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!