HTML to Plain Text with Ruby?

后端 未结 9 2032
无人共我
无人共我 2020-12-15 18:03

Is there anything out there to convert html to plain text (maybe a nokogiri script)? Something that would keep the line breaks, but that\'s about it.

If I write som

9条回答
  •  情深已故
    2020-12-15 18:36

    require 'open-uri'
    require 'nokogiri'
    
    url = 'http://en.wikipedia.org/wiki/Wolfram_language'
    doc = Nokogiri::HTML(open(url))
    
    text = ''
    doc.css('p,h1').each do |e|
      text << e.content
    end
    
    puts text
    

    This extracts just the desired text from a webpage (most of the time). If for example you wanted to also include links then add a to the css classes in the block.

提交回复
热议问题