PHP & RSS Feeds & Special Characters validation Problem

让人想犯罪 __ 提交于 2019-12-07 15:20:08

问题


I keep getting the following validation warning below. And I was wondering that some of my articles deal with special characters and was wondering how should I go about rendering or not rendering special characters in my RSS feeds? Should I use htmlentites or not? If so how?

In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations. line 22, column 35: title should not contain HTML: &

PHP code.

<title>' . htmlentities(strip_tags($title), ENT_QUOTES, "UTF-8") . '</title>

回答1:


You should use CDATA To escape characters in your XML feeds, this allows you to use your raw data without disrupting the XML layout.

Try this:

<title><![CDATA[ YOUR RAW CONTENT]]></title>

Note: do not use htmlentites and strip_tags as this will escape them for the browser, and any other reader should read them correctly.

Qoute from w3schools:

The term CDATA is used about text data that should not be parsed by the XML parser. Characters like "<" and "&" are illegal in XML elements. "<" will generate an error because the parser interprets it as the start of a new element. "&" will generate an error because the parser interprets it as the start of an character entity. Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA. Everything inside a CDATA section is ignored by the parser. A CDATA section starts with "":

http://www.w3schools.com/xml/xml_cdata.asp




回答2:


/* feedvalidator.org (Feedburner recommends this site to validate your feeds) says: "For the widest interop, the RSS Profile recommends the use of the hexadecimal character reference "&" to represent "&" and "<" to represent "<". */

        // find title problems
        $find[] = '<';
        $find[] = '\x92';
        $find[] = '\x84';

        // find content problems
        $find_c[] = '\x92';
        $find_c[] = '\x84';
        $find_c[] = '&nbsp;';

        // replace title
        $replace[] = '&#x3C;';
        $replace[] = '&#39;';
        $replace[] = '&#34;';

        // replace content
        $replace_c[] = '&#39;';
        $replace_c[] = '&#34;';
        $replace_c[] = ' ';

        // We don't want to re-replace "&" characters.  
        // So do this first because of PHP "feature" https://bugs.php.net/bug.php?id=33773
        $title = str_replace('&', '&#x26;', $title); 
        $title = str_replace($find, $replace, $title);
        $post_content = str_replace($find_c, $replace_c, $row[3]);

        // http://productforums.google.com/forum/#!topic/merchant-center/nIVyFrJsjpk
        $link = str_replace('&', '&amp;', $link);

Of course I'm doing some pre-processing before $title, $post_content and $link are added to my database. But this should help solve some common problems to get a valid RSS feed.

Update: Fixed the &#x26;#x26;#x26; "recursion" problem, see https://bugs.php.net/bug.php?id=33773




回答3:


Take out the htmlentities(). It's only for HTML files.



来源:https://stackoverflow.com/questions/4702835/php-rss-feeds-special-characters-validation-problem

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!