PHP DOMDocument : How to parse xml/rss Tags with CUSTOM field names?

怎甘沉沦 提交于 2019-12-21 23:40:41

问题


I have the below RSS to parse, something like:

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:x-wr="http://www.w3.org/2002/12/cal/prod/Apple_Comp_628d9d8459c556fa#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:x-example="http://www.example.com/rss/x-example" xmlns:x-microsoft="http://schemas.microsoft.com/x-microsoft" xmlns:xCal="urn:ietf:params:xml:ns:xcal" version="2.0">
    <channel>
        <item>
            <title>About Apples</title>
            <author>David K. Lowie</title>
            <x-trumba:customfield name="description">This is the description about apples</xCal:customfield>
            <x-trumba:customfield name="category">Fruits,Food,Apple</xCal:customfield>
        </item>
        <item>
            <title>About Oranges</title>
            <author>Marry L. Jones</title>
            <x-trumba:customfield name="description">This is the description about oranges</xCal:customfield>
            <x-trumba:customfield name="category">Fruits,Food,Orange</xCal:customfield>
        </item>
    </channel>
</rss>

In PHP, I only know how to read first two nodes, something like:

$rss = new DOMDocument();
$rss->load( "http://www.example.com/books.rss" );

foreach( $rss->getElementsByTagName("item") as $node ) {
    echo $node->getElementsByTagName("title")->item(0)->nodeValue,
    echo $node->getElementsByTagName("author")->item(0)->nodeValue,
}

But, these ones are the problems:

<x-trumba:customfield name="description">This is the description about apples</xCal:customfield>
<x-trumba:customfield name="category">Fruits,Food,Apple</xCal:customfield>

Please help:

  • How to parse the last nodes like <x-trumba:customfield name="description"> ?

(I can't change the RSS source since it's not under my control.)

Please kindly help.


回答1:


You XML is invalid, the 'x-trumba' prefix is not defined, and the closing tags of the elements use the 'xCal' prefix, refering to urn:ietf:params:xml:ns:xcal.

So replacing the prefix of the opening tags with 'xCal' and fixing the closing tags for 'author' makes the XML valid.

Then it is possible to register the xCalendar namespace and use Xpath to fetch the custom field contents:

$rss = new DOMDocument();
$rss->load( "http://www.example.com/books.rss" );
$xpath = new DOMXpath($rss);
$xpath->registerNamespace('x', 'urn:ietf:params:xml:ns:xcal');

foreach( $xpath->evaluate("//item") as $item ) {
    echo $xpath->evaluate('string(title)', $item), "\n";
    echo $xpath->evaluate('string(x:customfield[@name="description"])', $item), "\n";
}

Output:

About Apples
This is the description about apples
About Oranges
This is the description about oranges

The Xpath expression use a condition ([@name="description"]) to filter the customfield element nodes.



来源:https://stackoverflow.com/questions/38096183/php-domdocument-how-to-parse-xml-rss-tags-with-custom-field-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!