phantomjs

(八) Python网络爬虫之图片懒加载技术、selenium和PhantomJS

扶醉桌前 提交于 2019-12-02 19:51:25
(八) Python网络爬虫之图片懒加载技术、selenium和PhantomJS 引入 今日概要 图片懒加载 selenium phantomJs 谷歌无头浏览器 知识点回顾 验证码处理流程 今日详情 动态数据加载处理 一. 图片懒加载 案例分析:抓取站长素材http://sc.chinaz.com/中的图片数据 #!/usr/bin/env python # -*- coding:utf-8 -*- import requests from lxml import etree if __name__ == "__main__": url = 'http://sc.chinaz.com/tupian/gudianmeinvtupian.html' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', } #获取页面文本数据 response = requests.get(url=url,headers=headers) response.encoding = 'utf-8' page_text = response.text

Phantomjs works but is very slow

谁说胖子不能爱 提交于 2019-12-02 19:18:42
I am trying to take a screenshot of a webpage with PhantomJS. Specifically, I am using the example of capturing espn.com from this example. My code looks like this: var page = new WebPage(); page.open('http://www.espn.com', function (status) { page.render('fb.png'); phantom.exit(); }); I then go to my PhantomJS directory with either my terminal or command prompt and run: phantomjs shotty.js Everything runs great, however it takes 6-8 seconds to complete the output image. Is that normal? Is there a faster way to accomplish this so that it completes in a second or less? I am using CentOS and

Using the 'webpage' Phantom module in node.js

筅森魡賤 提交于 2019-12-02 19:15:57
I am trying to wrap a PhantomJS script in a node.js process. The phantom script grabs a url from the arguments provided on the command line and outputs a pdf (much similar to the rasterize.js example included with the pahntom install). The phantom script I have works fine, it's just my employer wants a node script if possible. No problem, I can use the node-phantom node module to wrap it. But now I've hit a stumbling block, my phantom script has: var page = require('webpage').create(); So, node.js is trying to find a module called 'webpage', the 'webpage' module is built into the phantom

PhantomJS / Javascript: write to file instead of to console

痴心易碎 提交于 2019-12-02 19:14:04
From PhantomJS, how do I write to a log instead of to the console? In the examples https://github.com/ariya/phantomjs/wiki/Examples , it always (in the ones I have looked at) says something like: console.log('some stuff I wrote'); This is not so useful. The following can write contents to the file directly by phantomjs: var fs = require('fs'); try { fs.write("/home/username/sampleFileName.txt", "Message to be written to the file", 'w'); } catch(e) { console.log(e); } phantom.exit(); The command in the answer by user984003 fails when there is some warning or exceptions occurred. And sometimes

How to debug ember-cli tests running in phantomjs

浪子不回头ぞ 提交于 2019-12-02 19:03:08
Context: I have an acceptance test for my ember-cli application, and the test passes just fine in Chrome. However, in phantomjs, my test fails -- the UI doesn't get created the same way, and I'm trying to work out why. (I think the test is broken because of https://github.com/ember-cli/ember-cli/issues/1763 , but the general question of how to debug remains) In Chrome, I can use the standard debugging tools on my tests and all is well -- but in phantomjs, I can't get at it with a debugger. I also don't see console.log() messages show up in the output -- all I get is a list of test results in

How to Use CasperJS in node.js?

旧城冷巷雨未停 提交于 2019-12-02 18:50:12
I would like to use CasperJS in node.js. I have referred to the following URL's to use CasperJS in node.js: https://github.com/sgentle/phantomjs-node http://casperjs.org/index.html#faq-executable With the help of the above URLs I have written the following code: //DISPLAY=:0 node test2.js var phantom = require('phantom'); console.log('Hello, world!'); phantom.create(function (ph) { ph.casperPath = '/opt/libs/casperjs' ph.injectJs('/opt/libs/casperjs/bin/bootstrap.js'); var casper = require('casper').create(); casper.start('http://google.fr/'); casper.thenEvaluate(function (term) { document

phantomjs exit() doesn't terminate the process

两盒软妹~` 提交于 2019-12-02 18:35:20
I've been using phantom.js on Windows 7 for quite some time now (I think v1.4.0 was the first version I used) and everything was always fine. But for some reason the process isn't properly terminated any longer when calling phantom.exit() and I absolutely don't know why. The problem started to occur in v1.7.0, from one day to another. Everything once worked fine in 1.7.0 but then it did no more. Even after upgrading to 1.8.0 and now 1.9.0 it still doesnt work. The console just hangs. I can't type anything, phantomjs.exe is still listed in the list of processes in the taskmanager, even CTRL+C

Navigating / scraping hashbang links with javascript (phantomjs)

五迷三道 提交于 2019-12-02 18:35:05
I'm trying to download the HTML of a website that is almost entirely generated by JavaScript. So, I need to simulate browser access and have been playing around with PhantomJS . Problem is, the site uses hashbang URLs and I can't seem to get PhantomJS to process the hashbang -- it just keeps calling up the homepage. The site is http://www.regulations.gov . The default takes you to #!home. I've tried using the following code (from here ) to try and process different hashbangs. if (phantom.state.length === 0) { if (phantom.args.length === 0) { console.log('Usage: loadreg_1.js <some hash>');

How can I send POST data to a phantomjs script

时光怂恿深爱的人放手 提交于 2019-12-02 18:29:53
问题 I am working with PHP/CURL and would like to send POST data to my phantomjs script, by setting the postfields array below: $ch = curl_init(); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"); curl_setopt($ch, CURLOPT_POST, TRUE); curl_setopt($ch, CURLOPT_POSTFIELDS, $postFieldArray); curl_setopt($ch, CURLOPT

Is there a way to read user input from keyboard for PhantomJS?

断了今生、忘了曾经 提交于 2019-12-02 18:29:13
I'm using PhantomJS to login website and the captcha has to be inputed manually. How can I save the captcha image to disk, and then input the captcha by hand in PhantomJS console? I had the same problem, just use the system module in combination with a page.render() and some argument passing to page.evaluate. page.render('pagewithcatpcha.jpg'); page.injectJs('http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js'); var arg1 = consoleRead(); page.evaluate(function (arg1) { $('.yourFormBox').val(arg1); $('.yourForm').submit(); }, arg1); function consoleRead() { var system = require(