I\'d like to open a web page with Nokogiri and extract all the words that a user sees when they visit the page in a browser and analyze the word frequency.
What is t
Update: since ruby 2.7 - there's new Enumerable method - tally - to count occurrences
Bug in the chosen answer: html.at('body').inner_text - will join all the text from all the nodes - without spaces. For example document containing:
this text
will result in "thistext"
Better: using this answer
html = Nokogiri::HTML(open 'http://stackoverflow.com/questions/6129357')
text = html.xpath('.//text() | text()').map(&:inner_text).join(' ')
occurrences = text.scan(/\w+/).map(&:downcase).tally