validate presence of url in html text

大城市里の小女人 提交于 2019-12-23 01:37:09

问题


I would like detect and filter if html text sent from a form, contains url or urls.

For example, i send from a form this html:

RESOURCES<br></u></b><a target="_blank" rel="nofollow" href="http://stackoverflow.com/users/778094/hyperrjas">http://stackoverflow.com/users/778094/hyperrjas</a>&nbsp;<br><a target="_blank" rel="nofollow" href="https://github.com/hyperrjas">https://github.com/hyperrjas</a>&nbsp;<br><a target="_blank" rel="nofollow" href="http://www.linkedin.com/pub/juan-ardila-serrano/11/2a7/62">http://www.linkedin.com/pub/juan-ardila-serrano/11/2a7/62</a>&nbsp;<br>

I don't want allow a or various url/urls in html text. it could be something like:

validate :no_urls

def no_urls
  if text_contains_url
   errors.add(:url, "#{I18n.t("mongoid.errors.models.profile.attributes.url.urls_are_not_allowed_in_this_text", url: url)}")
  end
end

I would like to know, how can I filter if a html text contain a or various urls?


回答1:


You can take Ruby´s build in URI module, which could already extract URIs from a text.

require "uri"

links = URI.extract("your text goes here http://example.com mailto:test@example.com foo bar and more...")
links => ["http://example.com", "mailto:test@example.com"]

So you could modify your validation like following:

validate :no_html

def no_html(text)
  links = URI.extract(text)
  unless links.empty?
    errors.add(:url, "#{I18n.t("mongoid.errors.models.profile.attributes.url.urls_are_not_allowed_in_this_text", url: url)}")
  end
end



回答2:


You could use a regex to parse strings that look like urls e.g. something like this: /^http:\/\/.*/

But if you want to detect html tags like a you should look into libraries for parsing html.

Nokogiri is one such library.




回答3:


The Mattherick's answer only works if the string not contains the colon symbol ":".

With Ruby 1.9.3 right thing is to add a second parameter to fix this problem.

Also, If you add a email address as plain text, this code is not filtering this email address. The fix to this problem is:

html_text  = "html text with email address e.g. info@test.com"
email_address = html_text.match(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/i)[0]

So, this is my code that works correctly for me:

def no_urls
  whitelist = %w(attr1, attr2, attr3, attr4)
  attributes.select{|el| whitelist.include?(el)}.each do |key, value|
    links = URI.extract(value, /http(s)?|mailto/)
    email_address = "#{value.match(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/i)}"
    unless links.empty? and email_address.empty?
      logger.info links.first.inspect
      errors.add(key, "#{I18n.t("mongoid.errors.models.cv.attributes.no_urls")}")
    end
  end
end

Regards!



来源:https://stackoverflow.com/questions/16234613/validate-presence-of-url-in-html-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!