phantomjs | 易学教程

(八) Python网络爬虫之图片懒加载技术、selenium和PhantomJS

阅读更多关于 (八) Python网络爬虫之图片懒加载技术、selenium和PhantomJS

(八) Python网络爬虫之图片懒加载技术、selenium和PhantomJS 引入今日概要图片懒加载 selenium phantomJs 谷歌无头浏览器知识点回顾验证码处理流程今日详情动态数据加载处理一. 图片懒加载案例分析：抓取站长素材http://sc.chinaz.com/中的图片数据 #!/usr/bin/env python # -*- coding:utf-8 -*- import requests from lxml import etree if __name__ == "__main__": url = 'http://sc.chinaz.com/tupian/gudianmeinvtupian.html' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', } #获取页面文本数据 response = requests.get(url=url,headers=headers) response.encoding = 'utf-8' page_text = response.text

Phantomjs works but is very slow

阅读更多关于 Phantomjs works but is very slow

I am trying to take a screenshot of a webpage with PhantomJS. Specifically, I am using the example of capturing espn.com from this example. My code looks like this: var page = new WebPage(); page.open('http://www.espn.com', function (status) { page.render('fb.png'); phantom.exit(); }); I then go to my PhantomJS directory with either my terminal or command prompt and run: phantomjs shotty.js Everything runs great, however it takes 6-8 seconds to complete the output image. Is that normal? Is there a faster way to accomplish this so that it completes in a second or less? I am using CentOS and

Using the 'webpage' Phantom module in node.js

阅读更多关于 Using the 'webpage' Phantom module in node.js

I am trying to wrap a PhantomJS script in a node.js process. The phantom script grabs a url from the arguments provided on the command line and outputs a pdf (much similar to the rasterize.js example included with the pahntom install). The phantom script I have works fine, it's just my employer wants a node script if possible. No problem, I can use the node-phantom node module to wrap it. But now I've hit a stumbling block, my phantom script has: var page = require('webpage').create(); So, node.js is trying to find a module called 'webpage', the 'webpage' module is built into the phantom

PhantomJS / Javascript: write to file instead of to console

阅读更多关于 PhantomJS / Javascript: write to file instead of to console

From PhantomJS, how do I write to a log instead of to the console? In the examples https://github.com/ariya/phantomjs/wiki/Examples , it always (in the ones I have looked at) says something like: console.log('some stuff I wrote'); This is not so useful. The following can write contents to the file directly by phantomjs: var fs = require('fs'); try { fs.write("/home/username/sampleFileName.txt", "Message to be written to the file", 'w'); } catch(e) { console.log(e); } phantom.exit(); The command in the answer by user984003 fails when there is some warning or exceptions occurred. And sometimes

How to debug ember-cli tests running in phantomjs

阅读更多关于 How to debug ember-cli tests running in phantomjs

Context: I have an acceptance test for my ember-cli application, and the test passes just fine in Chrome. However, in phantomjs, my test fails -- the UI doesn't get created the same way, and I'm trying to work out why. (I think the test is broken because of https://github.com/ember-cli/ember-cli/issues/1763 , but the general question of how to debug remains) In Chrome, I can use the standard debugging tools on my tests and all is well -- but in phantomjs, I can't get at it with a debugger. I also don't see console.log() messages show up in the output -- all I get is a list of test results in

How to Use CasperJS in node.js?

阅读更多关于 How to Use CasperJS in node.js?

I would like to use CasperJS in node.js. I have referred to the following URL's to use CasperJS in node.js: https://github.com/sgentle/phantomjs-node http://casperjs.org/index.html#faq-executable With the help of the above URLs I have written the following code: //DISPLAY=:0 node test2.js var phantom = require('phantom'); console.log('Hello, world!'); phantom.create(function (ph) { ph.casperPath = '/opt/libs/casperjs' ph.injectJs('/opt/libs/casperjs/bin/bootstrap.js'); var casper = require('casper').create(); casper.start('http://google.fr/'); casper.thenEvaluate(function (term) { document

phantomjs exit() doesn't terminate the process

阅读更多关于 phantomjs exit() doesn't terminate the process

I've been using phantom.js on Windows 7 for quite some time now (I think v1.4.0 was the first version I used) and everything was always fine. But for some reason the process isn't properly terminated any longer when calling phantom.exit() and I absolutely don't know why. The problem started to occur in v1.7.0, from one day to another. Everything once worked fine in 1.7.0 but then it did no more. Even after upgrading to 1.8.0 and now 1.9.0 it still doesnt work. The console just hangs. I can't type anything, phantomjs.exe is still listed in the list of processes in the taskmanager, even CTRL+C

Navigating / scraping hashbang links with javascript (phantomjs)

阅读更多关于 Navigating / scraping hashbang links with javascript (phantomjs)

I'm trying to download the HTML of a website that is almost entirely generated by JavaScript. So, I need to simulate browser access and have been playing around with PhantomJS . Problem is, the site uses hashbang URLs and I can't seem to get PhantomJS to process the hashbang -- it just keeps calling up the homepage. The site is http://www.regulations.gov . The default takes you to #!home. I've tried using the following code (from here ) to try and process different hashbangs. if (phantom.state.length === 0) { if (phantom.args.length === 0) { console.log('Usage: loadreg_1.js <some hash>');

How can I send POST data to a phantomjs script

阅读更多关于 How can I send POST data to a phantomjs script

问题 I am working with PHP/CURL and would like to send POST data to my phantomjs script, by setting the postfields array below: $ch = curl_init(); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"); curl_setopt($ch, CURLOPT_POST, TRUE); curl_setopt($ch, CURLOPT_POSTFIELDS, $postFieldArray); curl_setopt($ch, CURLOPT

Is there a way to read user input from keyboard for PhantomJS?

阅读更多关于 Is there a way to read user input from keyboard for PhantomJS?

I'm using PhantomJS to login website and the captcha has to be inputed manually. How can I save the captcha image to disk, and then input the captcha by hand in PhantomJS console? I had the same problem, just use the system module in combination with a page.render() and some argument passing to page.evaluate. page.render('pagewithcatpcha.jpg'); page.injectJs('http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js'); var arg1 = consoleRead(); page.evaluate(function (arg1) { $('.yourFormBox').val(arg1); $('.yourForm').submit(); }, arg1); function consoleRead() { var system = require(