Retrieving a web page including embedded objects

依然范特西╮ 提交于 2019-12-08 02:17:01

问题


I'd like to fetch a web page including images, flash animations and other embedded objects. What's a straightforward way of achieving this?


回答1:


Writing a web-crawler in the java programming language. http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/




回答2:


Use an open source HTML Parser such as HTMLCleaner - http://java-source.net/open-source/html-parsers/htmlcleaner or CyberNekoHtml - http://java-source.net/open-source/html-parsers/nekohtml.

Once you have used a parser to create a representation of the DOM of the web page, you can then load/download images and other embedded objects that exist in the DOM by performing queries on the DOM and extracting relevant src attributes from the HTML elements.




回答3:


try web-harvest



来源:https://stackoverflow.com/questions/2664404/retrieving-a-web-page-including-embedded-objects

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!