Nokogiri html parsing question

血红的双手。 提交于 2019-12-06 02:47:16

问题


I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.

This is the code I have thus far:

.....

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a/@href').each do |node|
#doc.xpath("//meta[@name='Keywords']").each do |node|

puts node.text

....

This successfully renders all of the a href text in the page, but when I try to use it for keywords it doesn't show anything. I've tried several variations of this with no luck. I assume that the the ".text" callout after node is wrong, but I'm not sure.

My apologies for how rough this code is, I'm doing my best to learn here.


回答1:


You're correct, the problem is text. text returns the text between the opening tag and the closing tag. Since meta-tags are empty, this gives you the empty string. You want the value of the "content" attribute instead.

doc.xpath("//meta[@name='Keywords']/@content").each do |attr|
  puts attr.value
end

Since you know that there will be only one meta-tag with the name "keywords", you don't actually need to loop through the results, but can take the first item directly like this:

puts doc.xpath("//meta[@name='Keywords']/@content").first.value

Note however, that this will cause an error if there is no meta-tag with the name "content", so the first option might be preferable.



来源:https://stackoverflow.com/questions/3442237/nokogiri-html-parsing-question

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!