HTML to Plain Text with Ruby?

后端 未结 9 2049
无人共我
无人共我 2020-12-15 18:03

Is there anything out there to convert html to plain text (maybe a nokogiri script)? Something that would keep the line breaks, but that\'s about it.

If I write som

9条回答
  •  没有蜡笔的小新
    2020-12-15 18:41

    I'm using the sanitize gem.

    (" " + Sanitize.clean(html).gsub("\n", "\n\n").strip).gsub(/^ /, "\t")
    

    It does drop hyperlinks though, which may be an issue for some applications. But I'm doing NLP text analysis, so this is perfect for my needs.

提交回复
热议问题