Converting named HTML entities to numeric HTML entities

前端未结

关注

 6  1865

半阙折子戏 2020-12-08 08:18

Is there a PHP function to convert named HTML entities into their respective numeric HTML entities?

For example:

$str = \"Oggi è un bel&am


      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   刺人心
                                             
                
                
                (楼主)
            
              
              
                2020-12-08 08:48
              

            
            
                        
The answer @hakre provided is the only one that really works to solve the proposed problem. Interestingly all the other answers, INCLUDING THE ONE THAT WAS ACCEPTED, do not work. Incidentally, the accepted answer doesn't really do anything! The others at least did something, but it was wrong. People who tried to answer, seem not to have understood that what the author wants is to convert named entities into their numerical correspondents.
Below is my contribution, based on a comment from the PHP documentation (https://www.php.net/manual/pt_BR/function.htmlentities.php#106535)
function xmlentities($aString) {
    $validChars = "A-Z0-9a-z\s_-";
    $twoChars = null;
    return preg_replace_callback("/[^$validChars]/"
    // Utilizar use(&$twoChars) faz com que $twoChars seja visível dentro da 
    // função anônima. É necessário usar o "&" se se pretende alterar o 
    // valor desta variável 
                                ,function ($aMatches) use(&$twoChars) { 
                                    $oneChar = $aMatches[0];
                                    switch($oneChar) {
    // Realiza substituições diretas. No caso, substitui as entidades que o 
    // XML reconhece. Eu poderia ter usado uma função do próprio PHP para 
    // isso, mas resolvi não usar porque são só 5 caracteres a substituir
                                        case "'": return "'";
                                        case '"': return """;
                                        case '&': return "&";
                                        case '<': return "<";
                                        case '>': return ">";
    // Caso não seja uma entidade reconhecida pelo xml, tratamentos 
    // especiais são necessários para identificar estamos lidadando com 
    // caracteres ISO-8859-1 ou UTF-8
                                        default: 
    // A tabela UTF-8 estende de forma compatível a tabela ASCII. Os 
    // primeiros 127 caracteres tem 1 byte e todos os demais tem dois bytes.
    // Os caracteres UTF-8 com 2 bytes começam com C2 (194) e seguem a 
    // sequência até chegar em CF (207). A condição abaixo detecta a 
    // existência de um destes bytes, que identificam um caractere UTF-8. 
    // Neste caso, se deve acumular ele numa variável com o intuito de,
    // posteriormente realizar a conversão de dois bytes e obter um único 
    // byte ISO-8859-1. Nesta primeira condição, há apenas o acúmulo na 
    // variável. Nada é retornado
                                            if (194 <= ord($oneChar) && ord($oneChar) <= 207) { 
                                                $twoChars = $oneChar;
                                                return;
    // Caso $twoChars contenha um valor, é porque em um passo anterior ele 
    // foi preenchido com o primeiro caractere de um par UTF-8. Neste caso 
    // devemos concatenar o segundo para, convertê-los para ISO-8859-1 e 
    // atribuir null à variável de controle ($twoChars). Em seguida, 
    // retornamos a saída formatada com o ordinal do caractere na tabela 
    // ISO-8859-1
                                            } else if ($twoChars) { 
                                                $twoChars .= $oneChar;
                                                $ansiChar = utf8_decode($twoChars);
                                                $twoChars = null;
                                                return "&#" . str_pad(ord($ansiChar), 3, "0", STR_PAD_LEFT) . ";";
    // Caso a string informada no argumento $aString da função já esteja 
    // codificada em ISO-88959-1, todos os seus caracteres terão 1 byte e 
    // neste caso, basta formatar diretamente este byte
                                            } else {
                                                return "&#" . str_pad(ord($oneChar), 3, "0", STR_PAD_LEFT) . ";";       
                                            }
                                    }
                                }
                                ,$aString);
}

My version comes with comments (use Google Translator) and is capable of handling only "raw" strings, without entities (& xxx;), so to use it, if your string has named entities, first convert it to its raw form:
$text = "Oggi è un bel giorno";

$text = html_entity_decode($text,ENT_QUOTES || ENT_HTML5,"UTF-8");

$text = xmlentities($text);

echo($text); // Output = Oggi è un bel giorno

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复