phantomjs | 易学教程

Download file via hyperlink in PhantomJS using Selenium

阅读更多关于 Download file via hyperlink in PhantomJS using Selenium

问题 I am using selenium to do a click function on a hyperlink, which is loaded on a certain page. The script works for google chrome, but does not for phantomjs. Why is this not working? from selenium import webdriver driver = webdriver.Chrome() #driver = webdriver.PhantomJS(executable_path = "/Users/jameslemieux/PythonProjects/phantomjs-1.9.8-macosx/bin/phantomjs") driver.get("http://www.youtube-mp3.org/?e=t_exp&r=true#v=hC-T0rC6m7I") elem = driver.find_element_by_link_text('Download') elem

Python selenium screen capture not getting whole page

阅读更多关于 Python selenium screen capture not getting whole page

问题 I am trying to create a generic webcrawler that will go to a site and take a screenshot. I am using Python, Selnium, and PhantomJS. The problem is that the screenshot is not capturing all the images on a page. For example, if I go to you tube, it doesn't capture images below the main page image. (I don't have high enough rep to post screen shot) I think this may have something to do with dynamic content, but I have tried the wait functions such as implicitly wait and on set_page_load_timeout

PhantomJS returning empty web page (python, Selenium)

阅读更多关于 PhantomJS returning empty web page (python, Selenium)

问题 Trying to screen scrape a web site without having to launch an actual browser instance in a python script (using Selenium). I can do this with Chrome or Firefox - I've tried it and it works - but I want to use PhantomJS so it's headless. The code looks like this: import sys import traceback import time from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.desired_capabilities import DesiredCapabilities dcap = dict(DesiredCapabilities

PhantomJS returning empty web page (python, Selenium)

阅读更多关于 PhantomJS returning empty web page (python, Selenium)

Running Phantomjs using C# to grab snapshot of webpage

阅读更多关于 Running Phantomjs using C# to grab snapshot of webpage

问题 I'm trying to grab snapshots of my own website using phantomjs - basically, this is to create a "preview image" of user-submitted content. I've installed phantomjs on the server and have confirmed that running it from the command line against the appropriate pages works fine. However, when I try running it from the website, it does not appear to do anything. I have confirmed that the code is being called, that phantom is actually running (I've monitored the processes, and can see it appear in

Nodejs Child Process: write to stdin from an already initialised process

阅读更多关于 Nodejs Child Process: write to stdin from an already initialised process

问题 I am trying to spawn an external process phantomjs using node's child_process and then send information to that process after it was initialized, is that possible? I have the following code: var spawn = require('child_process').spawn, child = spawn('phantomjs'); child.stdin.setEncoding = 'utf-8'; child.stdout.pipe(process.stdout); child.stdin.write("console.log('Hello from PhantomJS')"); But the only thing I got on the stdout is the initial prompt for phantomjs console. phantomjs> So it seems

Performant parsing of pages with Node.js and XPath

阅读更多关于 Performant parsing of pages with Node.js and XPath

问题 I'm into some web scraping with Node.js. I'd like to use XPath as I can generate it semi-automatically with several sorts of GUI. The problem is that I cannot find a way to do this effectively. jsdom is extremely slow. It's parsing 500KiB file in a minute or so with full CPU load and a heavy memory footprint. Popular libraries for HTML parsing (e.g. cheerio ) neither support XPath, nor expose W3C-compliant DOM. Effective HTML parsing is, obviously, implemented in WebKit, so using phantom or

How to write to CSV file in Javascript

阅读更多关于 How to write to CSV file in Javascript

问题 I have a script (using PhantomJS) that tests how long it takes to load a webpage. What I am trying to figure out is how to write the result of time taken to load the page to a .csv file. Then if I were to re-run the test again for it to add another result to the .csv file. code: var page = require('webpage').create(), system = require('system'), t, address; var pageLoadArray = []; var csvContents = ""; fs = require('fs'); if (system.args.length === 1) { console.log('Usage: loadspeed.js <some

CasperJS/ Javascript Selecting Multiple Options

阅读更多关于 CasperJS/ Javascript Selecting Multiple Options

问题 Trying to scrape a website, where this is the generic HTML code <select id="xxx" multiple name="zzz"> <option value="123">xaxaxa</option> <option value="124">zazaza</option> <option value="125">ajajaj</option> <option value="126">azzzsa</option> </select> It is not enclosed by a form so I tried using the fill() function that casperjs provides but that did not work. For single entries, I would usually casper.click() and that would work but this does not work for multiple entries even with

Scraping multiple URLs by looping in PhantomJS

阅读更多关于 Scraping multiple URLs by looping in PhantomJS

问题 I am using PhantomJS to scrape some websites and therefore extract information with r. I am following this tutorial. Everything works fine for a single page, but I couldn't find any simple tutorial on how to automate for multiple pages. My experiments so far: var countries = [ "Albania" ,"Afghanistan"]; var len = countries.length; var name1 = ".html"; var add1 = "http://www.kluwerarbitration.com/CommonUI/BITs.aspx?country="; var country =""; var name =""; var add=""; for (i=1; i <= len; i++){