Evaluate javascript on Ruby

别说谁变了你拦得住时间么 提交于 2019-12-10 13:46:57

问题


I tried get code html of a web page, but the web contains some javascript code that generates some data that I need.

http = Net::HTTP.new('localhost')
path = '/files.php'

# POST request -> logging in
data = ''
headers = {
   'Referer' =>  'http://localhost:8080/files.php',
   'User-Agent' => 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0',
   'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
   'Accept-Language' => 'es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3',
   'Content-Encoding' => 'gzip, deflate',
   'Connection' => 'keep-alive',
   'Cookie' => ''
}

resp, data = http.post(path, data, headers)

puts resp.body

But this only returns the html without evaluate the javascript. I would like get the final html after evaluate the javascript of the page.


回答1:


Assumptions made: Your Javascript lives in a single tag on your page. Otherwise you'll have to parse through looking for each bit of js you want. The gem you want is called "therubyracer", it embeds google's v8 javascript execution engine into your ruby.

Go to your command line and install therubyracer with

 gem install therubyracer

then:

 require 'v8'

 data = ''
 headers = {
    'Referer' =>  'http://localhost:8080/files.php',
    'User-Agent' => 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0',
    'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language' => 'es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3',
    'Content-Encoding' => 'gzip, deflate',
    'Connection' => 'keep-alive',
    'Cookie' => ''
 }

 resp, data = http.post(path, data, headers)

 js = resp[resp.index('<script')..resp.index('</script>')]
 js = js[js.index('/>')..-1]

 cxt = V8::Context.new
 result = cxt.eval(js)
 puts result



回答2:


Doing scraping with JavaScript enabled is hard. Basically, you need to be able to fully emulate the browser if you want to do it reliably.

Fortunately, there are gems out there that do exactly that. You could use Capybara with a JavaScript-capable driver like Selenium. For example (adapted from this blog post):

require "capybara"
require "capybara/dsl"

Capybara.run_server = false
Capybara.current_driver = :selenium
Capybara.app_host = "http://www.google.com/"

class Scraper
  include Capybara::DSL

  def scrape
    visit('/')
    fill_in "q", :with => "Capybara"
    click_button "Google Search"
    all(:xpath, "//li[@class='g']/h3/a").each { |a| puts a[:href] }
  end
end

There are alternative JavaScript drivers out there if Selenium isn't your cup of tea (it literally automates your browser, e.g. Firefox, rather than implementing a separate, "headless", browser of its own). See, for example, capybara-webkit or poltergeist, for headless browser drivers.



来源:https://stackoverflow.com/questions/14186399/evaluate-javascript-on-ruby

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!