phantomjs

How can I make casperjs repeat a loop until a certain condition is met?

北城以北 提交于 2019-12-05 21:55:41
I'm trying to get casperjs working with the following situation: A web page loads, then within that page, it ajax loads data items along with a 'read more' button which in turn loads some more data items. I need the script to recursively check if the 'read more' button exists (as there are many data items to load), if so, click it, else continue with the rest of the script and output the full page as a jpeg. I've tried by writing the code below, but it doesn't loop as I had hoped. It just clicks the button once, then outputs the image, even though more data loads, and the button still exists

How to get the height of a full html page in Phantomjs (javascript)?

元气小坏坏 提交于 2019-12-05 21:46:08
Hi have tried all of these: document.body.scrollHeight document.body.offsetHeight document.documentElement.clientHeight document.documentElement.scrollHeight document.documentElement.offsetHeight These work in a normal browser but in Phantomjs I get the CMD (command-line window) height.. I want to get the height so that I can crop a screenshot later in the code.. and the height of the page must be as it is being viewed on a normal browser I'm getting 300 pixels and I want to get the full html page height (that varies dependent on the URL).. Those values provide the expected values as with

PhantomJs - How to render a multi page PDF

无人久伴 提交于 2019-12-05 21:36:57
问题 I can create one-page PDFs with phantomJS; but I can't find on the doc how to create different pages (each page coming from an html view) and put them into one PDF ? I am using node-phantom module for NodeJS 回答1: Just need to specify a paperSize . Like this with module "phantom": "0.5.1" function(next) { phantom.create(function(doc) { next(null, doc); }, "phantomjs", Math.floor(Math.random()*(65535-49152+1)+49152)); }, function(ph, next) { ph.createPage(function(doc) { next(null, doc); }); },

Setting proxy in RSelenium with PhantomJS

让人想犯罪 __ 提交于 2019-12-05 21:18:52
I'm using the RSelenium library with the argument browserName = "phantomjs" in the remoteDriver command, however I was looking to run a test where I specify the type of the proxy server. I've seen that proxy authentication is possible in, e.g. Java , with the code used shown here: ArrayList<String> cliArgsCap = new ArrayList<String>(); cliArgsCap.add("--proxy=address:port"); cliArgsCap.add("--proxy-auth=username:password"); cliArgsCap.add("--proxy-type=http"); DesiredCapabilities capabilities = DesiredCapabilities.phantomjs(); capabilities.setCapability( PhantomJSDriverService.PHANTOMJS_CLI

2.2 爬虫请求库之selenium

﹥>﹥吖頭↗ 提交于 2019-12-05 20:10:21
一 介绍 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟浏览器的操作,比如跳转、输入、点击、下拉等,来拿到网页渲染之后的结果,可支持多种浏览器 from selenium import webdriver browser=webdriver.Chrome() browser=webdriver.Firefox() browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 官网:http://selenium-python.readthedocs.io 二 安装 1、有界面浏览器 selenium+chromedriver 2、无界面浏览器 PhantomJS不再更新 selenium+phantomjs 在 PhantomJS 年久失修, 后继无人的节骨眼 Chrome 出来救场, 再次成为了反爬虫 Team 的噩梦 自Google 发布 chrome 59 / 60 正式版 开始便支持 Headless mode 这意味着在无 GUI 环境下, PhantomJS 不再是唯一选择 selenium+谷歌浏览器headless模式 三 基本使用

Phantom-node module unable to load external resources

醉酒当歌 提交于 2019-12-05 19:26:20
i'm working on a nodejs server which renders posted html to pdf,png or jpg. ( https://github.com/svenhornberg/pagetox (server.js) if you want to try it) It is working really good, renders complex sites but only to that point that i want do load a simple image. For example i am sending following code to my server: <!doctype html> <html> <head> <title>logo</title> </head> <body> <img alt="logo" src="http://upload.wikimedia.org/wikipedia/commons/d/de/Wikipedia_Logo_1.0.png"> </body> </html> The Code should be okay. But the rendered response image does not contain the logo image. As said in the

Phantom JS + Docker: html font-family is not respected when converting from HTML

旧城冷巷雨未停 提交于 2019-12-05 18:13:16
When I run my phantomjs app in docker, in Node, it works fine (converting HTML to Jpeg). However, when I publish it to a docker container, the font names are no longer being respected. This app converts HTML into jpeg, pdf or other media, using html-convert npm, which is a wrapper for phantomjs dockerfile: FROM node:latest WORKDIR /app COPY package.json /app RUN npm install COPY . /app CMD node app.js EXPOSE 8081 package.json { "name": "htmlconverter", "version": "1.0.0", "description": "", "main": "app.js", "dependencies": { "body-parser": "^1.18.2", "ent": "^2.2.0", "express": "^4.16.3",

爬虫 - 请求库之selenium

☆樱花仙子☆ 提交于 2019-12-05 17:38:36
介绍 官方文档 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟浏览器的操作,比如跳转、输入、点击、下拉等,来拿到网页渲染之后的结果,可支持多种浏览器 from selenium import webdriver browser=webdriver.Chrome() # 谷歌浏览器 browser=webdriver.Firefox() # 火狐浏览器 browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 安装 >: pip3 install selenium 有界面浏览器 下载chromdriver.exe放到python安装路径的scripts目录中即可,注意最新版本是2.38,并非2.9 国内镜像网站地址:http://npm.taobao.org/mirrors/chromedriver/2.38/ 最新的版本去官网找:https://sites.google.com/a/chromium.org/chromedriver/downloads #验证安装 C:\Users\Administrator>python3 Python 3.6.1

selenium模块

纵然是瞬间 提交于 2019-12-05 17:04:30
selenium模块 官方文档 http://selenium-python.readthedocs.io/ 介绍 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟浏览器的操作,比如跳转、输入、点击、下拉等,来拿到网页渲染之后的结果,可支持多种浏览器 from selenium import webdriver browser=webdriver.Chrome() browser=webdriver.Firefox() browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 安装 有界面浏览器 selenium+chromedriver #安装:selenium+chromedriver pip3 install selenium 下载chromdriver.exe放到python安装路径的scripts目录中即可,注意最新版本是2.38,并非2.9 国内镜像网站地址:http://npm.taobao.org/mirrors/chromedriver/2.38/ 最新的版本去官网找:https://sites.google.com/a

CasperJS/PhantomJS failing SSL handshakes on some sites even with --ssl-protocol=any

巧了我就是萌 提交于 2019-12-05 16:08:51
I've had issues with CasperJS and SSL, but using --ssl-protocol=any has always fixed the problem, as referenced in this answer. In this case, I'm still having issues. I put this in the command line: casperjs --ssl-protocol=any --ignore-ssl-errors=true sanity.js This is sanity.js: var casper = require('casper').create({ verbose: true, logLevel: 'debug' }); casper.on("resource.error", function(resourceError){ console.log('Unable to load resource (#' + resourceError.id + 'URL:' + resourceError.url + ')'); console.log('Error code: ' + resourceError.errorCode + '. Description: ' + resourceError