phantomjs | 易学教程

How can I make casperjs repeat a loop until a certain condition is met?

阅读更多关于 How can I make casperjs repeat a loop until a certain condition is met?

I'm trying to get casperjs working with the following situation: A web page loads, then within that page, it ajax loads data items along with a 'read more' button which in turn loads some more data items. I need the script to recursively check if the 'read more' button exists (as there are many data items to load), if so, click it, else continue with the rest of the script and output the full page as a jpeg. I've tried by writing the code below, but it doesn't loop as I had hoped. It just clicks the button once, then outputs the image, even though more data loads, and the button still exists

How to get the height of a full html page in Phantomjs (javascript)?

阅读更多关于 How to get the height of a full html page in Phantomjs (javascript)?

Hi have tried all of these: document.body.scrollHeight document.body.offsetHeight document.documentElement.clientHeight document.documentElement.scrollHeight document.documentElement.offsetHeight These work in a normal browser but in Phantomjs I get the CMD (command-line window) height.. I want to get the height so that I can crop a screenshot later in the code.. and the height of the page must be as it is being viewed on a normal browser I'm getting 300 pixels and I want to get the full html page height (that varies dependent on the URL).. Those values provide the expected values as with

PhantomJs - How to render a multi page PDF

阅读更多关于 PhantomJs - How to render a multi page PDF

问题 I can create one-page PDFs with phantomJS; but I can't find on the doc how to create different pages (each page coming from an html view) and put them into one PDF ? I am using node-phantom module for NodeJS 回答1: Just need to specify a paperSize . Like this with module "phantom": "0.5.1" function(next) { phantom.create(function(doc) { next(null, doc); }, "phantomjs", Math.floor(Math.random()*(65535-49152+1)+49152)); }, function(ph, next) { ph.createPage(function(doc) { next(null, doc); }); },

Setting proxy in RSelenium with PhantomJS

阅读更多关于 Setting proxy in RSelenium with PhantomJS

I'm using the RSelenium library with the argument browserName = "phantomjs" in the remoteDriver command, however I was looking to run a test where I specify the type of the proxy server. I've seen that proxy authentication is possible in, e.g. Java , with the code used shown here: ArrayList<String> cliArgsCap = new ArrayList<String>(); cliArgsCap.add("--proxy=address:port"); cliArgsCap.add("--proxy-auth=username:password"); cliArgsCap.add("--proxy-type=http"); DesiredCapabilities capabilities = DesiredCapabilities.phantomjs(); capabilities.setCapability( PhantomJSDriverService.PHANTOMJS_CLI

2.2 爬虫请求库之selenium

阅读更多关于 2.2 爬虫请求库之selenium

一介绍 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器，完全模拟浏览器的操作，比如跳转、输入、点击、下拉等，来拿到网页渲染之后的结果，可支持多种浏览器 from selenium import webdriver browser=webdriver.Chrome() browser=webdriver.Firefox() browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 官网：http://selenium-python.readthedocs.io 二安装 1、有界面浏览器 selenium+chromedriver 2、无界面浏览器 PhantomJS不再更新 selenium+phantomjs 在 PhantomJS 年久失修, 后继无人的节骨眼 Chrome 出来救场, 再次成为了反爬虫 Team 的噩梦自Google 发布 chrome 59 / 60 正式版开始便支持 Headless mode 这意味着在无 GUI 环境下, PhantomJS 不再是唯一选择 selenium+谷歌浏览器headless模式三基本使用

Phantom-node module unable to load external resources

阅读更多关于 Phantom-node module unable to load external resources

i'm working on a nodejs server which renders posted html to pdf,png or jpg. ( https://github.com/svenhornberg/pagetox (server.js) if you want to try it) It is working really good, renders complex sites but only to that point that i want do load a simple image. For example i am sending following code to my server: <!doctype html> <html> <head> <title>logo</title> </head> <body> <img alt="logo" src="http://upload.wikimedia.org/wikipedia/commons/d/de/Wikipedia_Logo_1.0.png"> </body> </html> The Code should be okay. But the rendered response image does not contain the logo image. As said in the

Phantom JS + Docker: html font-family is not respected when converting from HTML

阅读更多关于 Phantom JS + Docker: html font-family is not respected when converting from HTML

When I run my phantomjs app in docker, in Node, it works fine (converting HTML to Jpeg). However, when I publish it to a docker container, the font names are no longer being respected. This app converts HTML into jpeg, pdf or other media, using html-convert npm, which is a wrapper for phantomjs dockerfile: FROM node:latest WORKDIR /app COPY package.json /app RUN npm install COPY . /app CMD node app.js EXPOSE 8081 package.json { "name": "htmlconverter", "version": "1.0.0", "description": "", "main": "app.js", "dependencies": { "body-parser": "^1.18.2", "ent": "^2.2.0", "express": "^4.16.3",

爬虫 - 请求库之selenium

阅读更多关于爬虫 - 请求库之selenium

介绍官方文档 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器，完全模拟浏览器的操作，比如跳转、输入、点击、下拉等，来拿到网页渲染之后的结果，可支持多种浏览器 from selenium import webdriver browser=webdriver.Chrome() # 谷歌浏览器 browser=webdriver.Firefox() # 火狐浏览器 browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 安装 >: pip3 install selenium 有界面浏览器下载chromdriver.exe放到python安装路径的scripts目录中即可，注意最新版本是2.38，并非2.9 国内镜像网站地址：http://npm.taobao.org/mirrors/chromedriver/2.38/ 最新的版本去官网找:https://sites.google.com/a/chromium.org/chromedriver/downloads #验证安装 C:\Users\Administrator>python3 Python 3.6.1

selenium模块

阅读更多关于 selenium模块

selenium模块官方文档 http://selenium-python.readthedocs.io/ 介绍 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器，完全模拟浏览器的操作，比如跳转、输入、点击、下拉等，来拿到网页渲染之后的结果，可支持多种浏览器 from selenium import webdriver browser=webdriver.Chrome() browser=webdriver.Firefox() browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 安装有界面浏览器 selenium+chromedriver #安装：selenium+chromedriver pip3 install selenium 下载chromdriver.exe放到python安装路径的scripts目录中即可，注意最新版本是2.38，并非2.9 国内镜像网站地址：http://npm.taobao.org/mirrors/chromedriver/2.38/ 最新的版本去官网找:https://sites.google.com/a

CasperJS/PhantomJS failing SSL handshakes on some sites even with --ssl-protocol=any

阅读更多关于 CasperJS/PhantomJS failing SSL handshakes on some sites even with --ssl-protocol=any

I've had issues with CasperJS and SSL, but using --ssl-protocol=any has always fixed the problem, as referenced in this answer. In this case, I'm still having issues. I put this in the command line: casperjs --ssl-protocol=any --ignore-ssl-errors=true sanity.js This is sanity.js: var casper = require('casper').create({ verbose: true, logLevel: 'debug' }); casper.on("resource.error", function(resourceError){ console.log('Unable to load resource (#' + resourceError.id + 'URL:' + resourceError.url + ')'); console.log('Error code: ' + resourceError.errorCode + '. Description: ' + resourceError

订阅 phantomjs