Get the values of attributes with namespace, using Nokogiri

我是研究僧i 提交于 2020-01-14 02:22:46

问题


I'm parsing a document.xml file using Nokogiri, extracted from .docx file and need to get values of attributes with names, like "w:val".

This is a sample of the source XML:

<w:document>
  <w:body>
    <w:p w:rsidR="004D5F21" w:rsidRPr="00820E0B" w:rsidRDefault="00301D39" pcut:cut="true">
      <w:pPr>
        <w:jc w:val="center"/>
      </w:pPr>
  </w:body>
</w:document>

This is a sample of the code:

require 'nokogiri' 

doc = Nokogiri::XML(File.open(path))
  doc.search('//w:jc').each do |n|
    puts n['//w:val']
  end

There is nothing in the console, only empty lines. How can I get the values of the attributes?


回答1:


require 'nokogiri' 

doc = Nokogiri::XML(File.open(path))
  doc.xpath('//jc').each do |n|
    puts n.attr('val')
  end

Should work. Don't forget to look at the docs : http://nokogiri.org/tutorials/searching_a_xml_html_document.html#fn:1




回答2:


The document is missing its namespace declaration, and Nokogiri isn't happy with it. If you check the errors method for your doc, you'll see something like:

puts doc.errors
Namespace prefix w on document is not defined
Namespace prefix w on body is not defined
Namespace prefix w for rsidR on p is not defined
Namespace prefix w for rsidRPr on p is not defined
Namespace prefix w for rsidRDefault on p is not defined
Namespace prefix pcut for cut on p is not defined
Namespace prefix w on p is not defined
Namespace prefix w on pPr is not defined
Namespace prefix w for val on jc is not defined
Namespace prefix w on jc is not defined
Opening and ending tag mismatch: p line 3 and body
Opening and ending tag mismatch: body line 2 and document
Premature end of data in tag document line 1

By using Nokogiri's CSS accessors, rather than XPath, you can step around namespace issues:

puts doc.at('jc')['val']

will output:

center

If you need to iterate over multiple jc nodes, use search or one of its aliases or act-alike methods, similar to what you did before.




回答3:


Show there:

require 'nokogiri' 

doc = Nokogiri::XML(File.open(path))
  doc.search('jc').each do |n|
  puts n['val']
end

Also, yes, read this: http://nokogiri.org/tutorials/searching_a_xml_html_document.html#fn:1



来源:https://stackoverflow.com/questions/8535509/get-the-values-of-attributes-with-namespace-using-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!