Web Crawling (Ajax/JavaScript enabled pages) using java

前端 未结 3 1728
滥情空心
滥情空心 2020-12-09 06:21

I am very new to this web crawling. I am using crawler4j to crawl the websites. I am collecting the required information by crawling these sites. My problem

3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-09 06:41

    I have find out the Solution of Dynamic Web page Crawling using Aperture and Selenium.Web Driver.
    Aperture is Crawling Tools and Selenium is Testing Tools which can able to rendering Inspect Element. 
    
    1. Extract the Aperture- core Jar file by Decompiler Tools and Create a Simple Web Crawling Java program. (https://svn.code.sf.net/p/aperture/code/aperture/trunk/)
    2. Download Selenium. WebDriver Jar Files and Added to Your Program.
    3. Go to CreatedDataObjec() method in org.semanticdesktop.aperture.accessor.http.HttpAccessor.(Aperture Decompiler).
    Added Below Coding 
    
       WebDriver driver = new FirefoxDriver();
       String baseurl=uri.toString();
       driver.get(uri.toString());
       String str = driver.getPageSource();
            driver.close();
     stream= new ByteArrayInputStream(str.getBytes());
    

提交回复
热议问题