How to save complete html page with frames/iframes included?

谁都会走 提交于 2019-12-11 05:16:53

问题


During the web scraping I want to save current page's html to a file for later debug. browser.html helps in most cases, but when the page contains an iframe/frame, it's content is not returned in browser.html, I have to get it separately with something like browser.iframe.html There are also cases when inside an iframe is another iframe. I can find every frame recursively and save its content, but separated files won't be very useful because I don't know the exact structure of the page.

For example I have the following page:

<!DOCTYPE html>
<html>
<head>
</head>
  <frameset cols="50%,20%,30%">
     <frame name="left" src="/html/left_frame.htm" />
     <frame name="right" src="/html/right_frame.htm" />
     <noframes>
       <body>
          Your browser does not support frames.
       </body>
     </noframes>
     <frame src="http://example.com"/>
  </frameset>
</html>

I want to save it to file using watir. Any ideas?


回答1:


Frames act much like a completely separate web page, and while you can see the content as it appears in the rendered document and the dom, contents of a frame are not technically part of the html for a page. You can see this in the browser, right click the main doc and view html, then compare that to what you get right clicking content that is in a frame and viewing html.

To write all the html out to files, you are likely going to need to make a method that writes out html of a frame, looks for other frames, and calls the same method recursively on any frames found inside.

Alternativly maybe look at a gem like nokogiri that is designed to parse html, it might have better methods for this sort of thing, or existing examples for how to do what you want



来源:https://stackoverflow.com/questions/28476029/how-to-save-complete-html-page-with-frames-iframes-included

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!