How to use RubyGem Sanitize transformers to sanitize an unordered list into a comma seperated list?

痞子三分冷 提交于 2019-12-23 04:23:03

问题


Any one familiar with the RubyGem Sanitize, that provide an example of building a "Transformer" to convert

"<ul><li>a</li><li>b</li><li>c</li></ul>" 

into

"a,b, and c"

?


回答1:


IMO transformers are not for pulling out data like this:

Transformers allow you to filter and modify nodes using your own custom logic [...]

This is not what you're trying to do; you're trying to pull data out of nodes, and transform it. In your example, you're not doing the same thing to each element: you're sometimes appending a comma, sometimes appending a comma and the word "and".

In order to do that, you either need to save state and post-process, or look ahead in the node stream to see if you're visiting the last node. I don't know of a trivial way to do that with Sanitize's transformers, so this example saves state and post-processes.

require 'sanitize'
items = []
s = "<ul><li>some space</li><li>more stuff with spaces</li><li>last one</li></ul>"
save_li = lambda do |env|
  node = env[:node]
  items << node.text.strip if node.text?
end
Sanitize.clean(s, :transformers => save_li)
# => "  some space  more stuff with spaces  last one  "    
output = "#{items[0..-2].join(", ")}, and #{items[-1]}"
# => "some space, more stuff with spaces, and last one"

IMO this example is an abuse of transformers because it's being run only for its side effect, it does nothing other than look for text nodes.

If one of the list items has embedded HTML, the naive approach no longer works, and you need to start knowing more Nokogiri anyway:

items = []
s = "<ul><li>some space</li><li>item <b>with<b/> html</li><li>c</li></ul>"
save_li = lambda do |env|
  node = env[:node]
  items << node.content if node.name == "li"
end
Sanitize.clean(s, :transformers => save_li)
# => "  some space  item with html  c  "
output = "#{items[0..-2].join(", ")}, and #{items[-1]}"    
# => "some space, item with html, and c"

This approach relies on the default Sanitize behavior of nothing being whitelisted. The <b> tags are still visited by the save_li lambda, but they're stripped. This has a potential to cause issues under a variety of circumstances.



来源:https://stackoverflow.com/questions/8614636/how-to-use-rubygem-sanitize-transformers-to-sanitize-an-unordered-list-into-a-co

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!