phantomjs | 易学教程

Selenium Webdriver: HTML renders differently for Firefox vs. PhantomJS

阅读更多关于 Selenium Webdriver: HTML renders differently for Firefox vs. PhantomJS

I am using Selenium Webdriver in Node JS to do a Google search. When I set the browser as Firefox on my local machine, the Google results page renders as expected; it's the same as I see when I do the Google search as a human. Now, I'm trying to do the same on my Heroku server. I can't seem to get Firefox on the server, so I'm using PhantomJS. It successfully does the Google search, but some data is missing from the page (I presume it is added later by Javascript). How can I make the PhantomJS results page look the same as Firefox? Can I make PhantomJS appear to be Firefox? var driver = new

Page is not completely loaded/rendered when onLoadFinished fires

阅读更多关于 Page is not completely loaded/rendered when onLoadFinished fires

问题 I'm using Phantomjs to examine my application. The page I am looking at includes a lot of components and is powered by Angularjs. My test server is slow. In the onLoadFinished event, I am imaging the web page using render: page.onLoadFinished = function(response) { console.log(response, page.url); if (page.url.indexOf("login") == -1) { page.render('zoot.png'); phantom.exit(); } }; My issue is that zoot.png only includes the site's menu bar. The menu bar's text and static images are rendered

爬虫 - selenium模块

阅读更多关于爬虫 - selenium模块

selenium介绍： selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器，完全模拟浏览器的操作，比如跳转、输入、点击、下拉等，来拿到网页渲染之后的结果，可支持多种常见的浏览器 from selenium import webdriver browser=webdriver.Chrome() browser=webdriver.Firefox() browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 官网：http://selenium-python.readthedocs.io 环境搭建 1.在python中使用selenium需要先安装对应的模块 pip install selenium 2.selenium的原理是操作驱动浏览器来完成对目标页面的请求与渲染，所以需要下载对应的浏览器驱动程序，推荐使用chrome 镜像地址：https://npm.taobao.org/mirrors/chromedriver/ 需要注意的是，驱动程序版本需要与浏览器版本对应，你可以打开chrome的关于浏览器查看到具体版本。驱动与浏览器的版本对应关系

How do I turn off the logging for PhantomJS in Watir-WebDriver?

阅读更多关于 How do I turn off the logging for PhantomJS in Watir-WebDriver?

问题 I see a lot of logging information for PhantomJS in my ruby (1.8) watir code, i.e INFO messages. How do I turn it off ? I got the Java code from google search but not ruby code. Java PhantomJSDriver disable all logs in console PhantomJS is launching GhostDriver... [INFO - 2015-01-27T10:00:00.000Z] GhostDriver - Main - running on port 8910 [INFO - 2015-01-27T10:00:00.000Z] Session [30344df0-a7de-11e4-9220-5bf7aac4a098] - _decorateNewWindow - page.settings: {"XSSAuditingEnabled":false,

PhantomJS Node - page.open - cannot keep track of multiple pages

阅读更多关于 PhantomJS Node - page.open - cannot keep track of multiple pages

I'm using Phantom Node to interface node with PhantomJS. I'm trying to open pages in parallel, but the issue is that page.open callback function does not pass back the reference to the page, so I don't have a way to know which page has completed. Relevant Code self.queue[j].page.open.call( self.queue[j].page, rows[i].url, function( status ) { console.log( this ) // <-- returns undefined // So how do I keep track of which pages have finished loading? // The only variable I have available here is `status` }); Full Function Code: SnapEngine.prototype.processSnaps = function( rows, type ) { var

Cannot pass module functions to Page

阅读更多关于 Cannot pass module functions to Page

I have a module called util with the methods getMutedColor and some others. getMutedColor relies on another called rand in the same module. page.includeJs('https://cdnjs.cloudflare.com/ajax/libs/d3/3.4.10/d3.min.js', function() { var util = require('./util'); var svg = page.evaluate(pageContext.pageExec, data, meta, util); /** ... **/ } I can call util.getMutedColor() just fine within this scope but in my pageContext.pageExec function, util.getMutedColor no longer exists. The util parameter is still an object, but I cannot call any of the exported methods: TypeError: 'undefined' is not a

Include js file with PhantomJS

阅读更多关于 Include js file with PhantomJS

问题 In a PhantomJS script, I am trying to load a local JavaScript file that defines an array: var webPage = require('webpage'), page = webPage.create(); injected = page.injectJs('./codes.js'); if (injected) { console.log('injected codes.js'); console.log(myCodes); } phantom.exit(); codes.js : myCodes = new Array(); myCodes[0] = { "stuff": "here" }; // more like this I'd expect the myCodes array to be available. Yet I receive injected codes.js ReferenceError: Can't find variable: myCodes 回答1:

PhantomJSDriver works for HTTP but not for HTTPS

阅读更多关于 PhantomJSDriver works for HTTP but not for HTTPS

public class FooTest { WebDriver driver; @Before public void beforeTest() { DesiredCapabilities capabilities = new DesiredCapabilities(); capabilities.setJavascriptEnabled(true); capabilities.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true); driver = new PhantomJSDriver(capabilities); driver.manage().timeouts().pageLoadTimeout(10, TimeUnit.SECONDS); } @Test public void test() { driver.get("http://www.example.com"); WebElement e = driver.findElement(By.tagName("h1")); System.out.println("TEXT" + e.getAttribute("innerHTML")); assertNotNull(e); driver.quit(); } } Hi, I'm just simply trying

Getting phantomjs, socket.io and gevent-socketio to work together

阅读更多关于 Getting phantomjs, socket.io and gevent-socketio to work together

问题 I am trying to build an application that utilizes Phantomjs 1.7 (simulating a browser) and create a Python back-end to fire up some events and collect data. The problem is that the two processes Phantomjs and my Python program need to communicate bi-bidirectionally. The problem is that inside page.evaluate I cannot: pass any complex objects such as "fs" (to read from stdin) create a WebSocket to connect to my Python script any other form of inter-process communication is restricted So my

Phantomjs to scrape webpage function not working

阅读更多关于 Phantomjs to scrape webpage function not working

问题 I am using phantomjs to learn how to scrape a webpage, so far I have developed the following code below.. I know that I am able to connect to the site, but I am unable to get data from the table at all..am I on the right track? My goal is to scrape data from the table on this site. I also understand that I need to use includeJs or injectJs to wait for the table to load else I would be scraping an empty html page. I am trying to put these concepts together, but am stuck for over 3 days now.