How do I use xpath on nodes with a prefix but without a namespace?

断了今生、忘了曾经 提交于 2019-12-22 05:54:15

问题


I have an XML file that I need to parse. I have no control over the format of the file and cannot change it.

The file makes use of a prefix (call it a), but it doesn't define a namespace for that prefix anywhere. I can't seem to use xpath to query for nodes with the a namespace.

Here's the contents of the xml document

<?xml version="1.0" encoding="UTF-8"?>

<a:root>
  <a:thing>stuff0</a:thing>
  <a:thing>stuff1</a:thing>
  <a:thing>stuff2</a:thing>
  <a:thing>stuff3</a:thing>
  <a:thing>stuff4</a:thing>
  <a:thing>stuff5</a:thing>
  <a:thing>stuff6</a:thing>
  <a:thing>stuff7</a:thing>
  <a:thing>stuff8</a:thing>
  <a:thing>stuff9</a:thing>
</a:root>

I am using Nokogiri to query the document:

doc = Nokogiri::XML(open('text.xml'))
things = doc.xpath('//a:thing')

The fails giving the following error:

Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //a:thing

From my research, I found out that I could specify the namespace for the prefix in the xpath method:

things = doc.xpath('//a:thing', a: 'nobody knows')

This returns an empty array.

What would be the best way for me to get the nodes that I need?


回答1:


The problem is that the namespace is not properly defined in the XML document. As a result, Nokogiri sees the node names as being "a:root" instead of "a" being a namespace and "root" being the node name:

xml = %Q{
    <?xml version="1.0" encoding="UTF-8"?>
    <a:root>
      <a:thing>stuff0</a:thing>
      <a:thing>stuff1</a:thing>
    </a:root>
}
doc = Nokogiri::XML(xml)
puts doc.at_xpath('*').node_name
#=> "a:root"
puts doc.at_xpath('*').namespace
#=> ""

Solution 1 - Specify node name with colon

One solution is to search for nodes with the name "a:thing". You cannot do //a:thing since the XPath will treat the "a" as a namespace. You can get around this by doing //*[name()="a:thing"]:

xml = %Q{
    <?xml version="1.0" encoding="UTF-8"?>
    <a:root>
      <a:thing>stuff0</a:thing>
      <a:thing>stuff1</a:thing>
    </a:root>
}
doc = Nokogiri::XML(xml)
things = doc.xpath('//*[name()="a:thing"]')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>

Solution 2 - Modify the XML document to define the namespace

An alternative solution is to modify the XML file that you get to properly define the namespace. The document will then behave with namespaces as expected:

xml = %Q{
    <?xml version="1.0" encoding="UTF-8"?>
    <a:root>
      <a:thing>stuff0</a:thing>
      <a:thing>stuff1</a:thing>
    </a:root>
}
xml.gsub!('<a:root>', '<a:root xmlns:a="foo">')
doc = Nokogiri::XML(xml)
things = doc.xpath('//a:thing')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>


来源:https://stackoverflow.com/questions/20004081/how-do-i-use-xpath-on-nodes-with-a-prefix-but-without-a-namespace

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!