How do I remove HTTP links with ActiveSupport's “starts_with” using Nokogiri?

佐手、 提交于 2019-12-11 17:55:21

问题


When I try this:

item.css("a").each do |a|
  if !a.starts_with? 'http://'
     a.replace a.content
  end
end

I get:

NoMethodError: undefined method 'starts_with?' for #<Nokogiri::XML::Element:0x1b48a60> 

EDIT:

Sure there is a cleaner way, but this seems to be working.

item.css("a").each do |a|
  unless a["href"].blank?
    if !a["href"].starts_with? 'http://' 
      a.replace a.content
    end
  end
end

回答1:


The problem is you're trying to use the starts_with method on an object that doesn't implement it.

item.css("a").each do |a|

will return XML nodes in a. Those belong to Nokogiri. What you want to do is convert the node to text, but only the part you want to check, which, because it's a parameter of the node, can be accessed like this:

a['href']

So, you want to use something like this:

item.css("a").each do |a|
  if !(a.starts_with?['href']('http://'))
     a.replace(a.content)
  end
end

The downside to this is you have to walk through every <a> tag in the document, which can be slow on a big page with lots of links.

An alternate way to go about it is to use XPath's starts-with function:

require 'nokogiri'

item = Nokogiri::HTML('<a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>')
puts item.to_html

which outputs:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>
>> <a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>
>> </body></html>

Here's how to do it using XPath:

item.search('//a[not(starts-with(@href, "http://"))]').each do |a|
  a.replace(a.content)
end
puts item.to_html

Which outputs:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>foo<a href="http://bar">bar</a>
>> </body></html>

The advantage to using XPath to find the nodes is it all runs in compiled C, rather than letting Ruby do it.




回答2:


Shouldn't that method be start_with?



来源:https://stackoverflow.com/questions/5922041/how-do-i-remove-http-links-with-activesupports-starts-with-using-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!