phantomjs

Writing to filesystem from within phantomjs sandboxed environment

▼魔方 西西 提交于 2019-12-24 00:34:49
问题 I need to traverse forms on a site and save intermediate results to files. I'm using phantomjs' page.evaluate , but I'm having trouble accessing the filesystem from within page.evaluate 's sandboxed environment. I have something like this: for (var i = 0; i<option1.length; i++){ for (var ii = 0; ii<option2.length; ii++){ for (var iii = 0; iii<option3.length; iii++){ ... //I found what I want to save fs.write("someFileName", someData); } } } Obviously, I don't have access to nodejs' fs from

selenium模块

丶灬走出姿态 提交于 2019-12-24 00:19:16
一 介绍 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟浏览器的操作,比如跳转、输入、点击、下拉等,来拿到网页渲染之后的结果,可支持多种浏览器 from selenium import webdriver browser=webdriver.Chrome() browser=webdriver.Firefox() browser=webdriver.PhantomJS() browser=webdriver.Safari() browser=webdriver.Edge() 官网:http://selenium-python.readthedocs.io 二 安装 #安装:selenium+chromedriver pip3 install selenium 下载chromdriver.exe放到python安装路径的scripts目录中即可,注意最新版本是2.29,并非2.9 国内镜像网站地址:http://npm.taobao.org/mirrors/chromedriver/2.29/ 最新的版本去官网找:https://sites.google.com/a/chromium.org/chromedriver/downloads #验证安装 C:

Downloading a file with CasperJS from POST attachment

南楼画角 提交于 2019-12-24 00:07:59
问题 I almost have this working, I just can't seem to download the file when it comes up. What am I doing wrong here? When clicking the button "Download Sales Report" a CSV should download, by my console.log() never even fires off. var casper = require('casper').create(); casper.start('http://www.waynecountyauditor.org/Reports.aspx?ActiveTab=Sales') .waitForText("Accept") .thenClick('#ctl00_ContentPlaceHolder1_btnDisclaimerAccept') .waitForText("View Sales") .thenClick('#ctl00_ContentPlaceHolder1

Footer's contents don't seem to work

你说的曾经没有我的故事 提交于 2019-12-23 20:58:57
问题 I'm trying create custom footers such in phantomjs examples: https://github.com/ariya/phantomjs/blob/master/examples/printheaderfooter.js Here is my code: var phantom = require('node-phantom'); phantom.create(function (err, ph) { ph.createPage(function (err, page) { page.set('paperSize', { format: 'A4', orientation: 'portrait', footer: { contents: ph.callback(function (pageNum, numPages) { if (pageNum == 1) { return ""; } return "<h1>Header <span style='float:right'>" + pageNum + " / " +

Django with splinter and phantomjs is painfully slow

强颜欢笑 提交于 2019-12-23 18:33:56
问题 Today I tried combining django's LiveServerTestCase with splinter and phantomjs webdriver. Here's what I do (simplified version): class Test(LiveServerTestCase): def setUp(self): self.browser = Browser('phantomjs') def tearDown(self): self.browser.quit() def test(self): self.browser.visit(self.live_server_url) self.assertIn("Hello world!", self.browser.title) Sometimes tests run fine - even though taking a second per test method to execute. But sometimes it can randomly take ~100 seconds for

PhantomJS version compatibility with Selenium

我只是一个虾纸丫 提交于 2019-12-23 17:41:09
问题 I could not use Selenium WebDriver (a.k.a Selenium 2) 2.53.0 with PhantomJS 1.2.0. Is there any workaround? I had to use WebDriver 2.41.0 instead of latest version (2.53.0). 回答1: phantomjsdriver-1.2.1.jar is provided with Selenium-2.53.0. If phantomjsdriver-1.2.0 is not worked with Selenium-2.53.0 you can use phantomjsdriver-1.2.1 . Dependency code for the pom.xml should be as below: <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>2.53.0

How to manage to exit phantomJS launcher after tests execution?

落爺英雄遲暮 提交于 2019-12-23 17:38:41
问题 I am writing angular 2 app and trying to setup test on Gitlab CI using phantomJS launcher. After all tests pass ok phantomJS launcher remains active forever(http://i.imgur.com/TD7cLdq.png). How I can manage to exit after test successfully passed? Here is my package.json : { "name": "kibernum-dnc-client", "version": "0.0.0", "license": "MIT", "angular-cli": {}, "scripts": { "ng": "ng", "start": "ng serve", "lint": "tslint \"src/**/*.ts\" --project src/tsconfig.json --type-check && tslint \"e2e

1--爬虫环境安装篇

喜夏-厌秋 提交于 2019-12-23 13:07:13
环境要求:python3+以上版本 一、Selenium(转载:https://cuiqingcai.com/5141.html) Selenium是一个自动化测试工具,利用它我们可以驱动浏览器执行特定的动作,如点击、下拉等操作。对于一些JavaScript渲染的页面来说,这种抓取方式非常有效。下面我们来看看Selenium的安装过程。 1. 相关链接 官方网站: http://www.seleniumhq.org GitHub: https://github.com/SeleniumHQ/selenium/tree/master/py PyPI: https://pypi.python.org/pypi/selenium 官方文档: http://selenium-python.readthedocs.io 中文文档: http://selenium-python-zh.readthedocs.io 2. pip安装 这里推荐直接使用pip安装,执行如下命令即可: sudo pip3 install selenium 但这样做还不够,因为我们还需要用浏览器(如Chrome、Firefox等)来配合Selenium工作。 后面我们会介绍Chrome、Firefox、PhantomJS三种浏览器的配置方式。有了浏览器,我们才可以配合Selenium进行页面的抓取。 二

Scraping an infinite scroll page stops without scrolling

半城伤御伤魂 提交于 2019-12-23 12:14:20
问题 I am currently working with PhantomJS and CasperJS to scrape for links in a website. The site uses javascript to dynamically load results. The below snippet however is not getting me all the results the page contains. What I need is to scroll down to the bottom of the page, see if the spinner shows up (meaning there’s more content still to come), wait until the new content had loaded and then keep scrolling until no more new content was shown. Then store the links with class name .title in an

Geb tests pass with Chrome, fail with PhantomJS

假装没事ソ 提交于 2019-12-23 10:28:23
问题 I have noticed that some Geb functional tests pass with Chrome but fail with PhantomJS, holding all other variables constant. This happens mostly with pages that have some kind of asynchronous activity - one call to $(selector).click() triggers an event handler that updates the DOM, and the DOM updates need to complete before calling $(anotherSelector).click() . I can make the PhantomJS tests pass again by aggressively using waitFor but I don't understand why this would be required with the