Is there a better HTML escaping and unescaping tool than CGI for Ruby?

ε祈祈猫儿з 提交于 2019-12-03 11:11:09

问题


CGI.escapeHTML is pretty bad, but CGI.unescapeHTML is completely borked. For example:

require 'cgi'

CGI.unescapeHTML('…')
# => "…"                    # correct - an ellipsis

CGI.unescapeHTML('…')
# => "…"             # should be "…"

CGI.unescapeHTML('¢')
# => "\242"                 # correct - a cent

CGI.unescapeHTML('¢')
# => "¢"               # should be "\242"

CGI.escapeHTML("…")
# => "…"                    # should be "…"

It appears that unescapeHTML knows about all of the numeric codes plus &, <, >, and ". And escapeHTML only knows about those last four -- it doesn't do any of the numeric codes. I understand that escaping doesn't generally need to be as robust since HTML will allow the literal versions of most characters except the four that CGI.escapeHTML knows about. But unescaping should really be better.

Is there a better tool out there, at least for unescaping?


回答1:


The htmlentities gem should do the trick:

require 'rubygems'
require 'htmlentities'

coder = HTMLEntities.new

coder.decode('…') # => "…"
coder.decode('…') # => "…"
coder.decode('¢') # => "¢"
coder.decode('¢') # => "¢"
coder.encode("…", :named) # => "…"
coder.encode("…", :decimal) # => "…"



回答2:


require 'rubygems'
require 'hpricot'

Hpricot('…', :xhtml_strict => true).to_plain_text

Though you might have to fiddle around with the character encoding.



来源:https://stackoverflow.com/questions/378847/is-there-a-better-html-escaping-and-unescaping-tool-than-cgi-for-ruby

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!