phantomjs

Download file via hyperlink in PhantomJS using Selenium

て烟熏妆下的殇ゞ 提交于 2019-12-18 03:46:55
问题 I am using selenium to do a click function on a hyperlink, which is loaded on a certain page. The script works for google chrome, but does not for phantomjs. Why is this not working? from selenium import webdriver driver = webdriver.Chrome() #driver = webdriver.PhantomJS(executable_path = "/Users/jameslemieux/PythonProjects/phantomjs-1.9.8-macosx/bin/phantomjs") driver.get("http://www.youtube-mp3.org/?e=t_exp&r=true#v=hC-T0rC6m7I") elem = driver.find_element_by_link_text('Download') elem

Python selenium screen capture not getting whole page

风格不统一 提交于 2019-12-18 03:45:38
问题 I am trying to create a generic webcrawler that will go to a site and take a screenshot. I am using Python, Selnium, and PhantomJS. The problem is that the screenshot is not capturing all the images on a page. For example, if I go to you tube, it doesn't capture images below the main page image. (I don't have high enough rep to post screen shot) I think this may have something to do with dynamic content, but I have tried the wait functions such as implicitly wait and on set_page_load_timeout

PhantomJS returning empty web page (python, Selenium)

佐手、 提交于 2019-12-18 02:12:00
问题 Trying to screen scrape a web site without having to launch an actual browser instance in a python script (using Selenium). I can do this with Chrome or Firefox - I've tried it and it works - but I want to use PhantomJS so it's headless. The code looks like this: import sys import traceback import time from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.desired_capabilities import DesiredCapabilities dcap = dict(DesiredCapabilities

PhantomJS returning empty web page (python, Selenium)

微笑、不失礼 提交于 2019-12-18 02:11:33
问题 Trying to screen scrape a web site without having to launch an actual browser instance in a python script (using Selenium). I can do this with Chrome or Firefox - I've tried it and it works - but I want to use PhantomJS so it's headless. The code looks like this: import sys import traceback import time from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.desired_capabilities import DesiredCapabilities dcap = dict(DesiredCapabilities

Running Phantomjs using C# to grab snapshot of webpage

本小妞迷上赌 提交于 2019-12-17 22:33:57
问题 I'm trying to grab snapshots of my own website using phantomjs - basically, this is to create a "preview image" of user-submitted content. I've installed phantomjs on the server and have confirmed that running it from the command line against the appropriate pages works fine. However, when I try running it from the website, it does not appear to do anything. I have confirmed that the code is being called, that phantom is actually running (I've monitored the processes, and can see it appear in

Nodejs Child Process: write to stdin from an already initialised process

雨燕双飞 提交于 2019-12-17 22:13:16
问题 I am trying to spawn an external process phantomjs using node's child_process and then send information to that process after it was initialized, is that possible? I have the following code: var spawn = require('child_process').spawn, child = spawn('phantomjs'); child.stdin.setEncoding = 'utf-8'; child.stdout.pipe(process.stdout); child.stdin.write("console.log('Hello from PhantomJS')"); But the only thing I got on the stdout is the initial prompt for phantomjs console. phantomjs> So it seems

Performant parsing of pages with Node.js and XPath

孤街浪徒 提交于 2019-12-17 21:56:16
问题 I'm into some web scraping with Node.js. I'd like to use XPath as I can generate it semi-automatically with several sorts of GUI. The problem is that I cannot find a way to do this effectively. jsdom is extremely slow. It's parsing 500KiB file in a minute or so with full CPU load and a heavy memory footprint. Popular libraries for HTML parsing (e.g. cheerio ) neither support XPath, nor expose W3C-compliant DOM. Effective HTML parsing is, obviously, implemented in WebKit, so using phantom or

How to write to CSV file in Javascript

断了今生、忘了曾经 提交于 2019-12-17 21:33:14
问题 I have a script (using PhantomJS) that tests how long it takes to load a webpage. What I am trying to figure out is how to write the result of time taken to load the page to a .csv file. Then if I were to re-run the test again for it to add another result to the .csv file. code: var page = require('webpage').create(), system = require('system'), t, address; var pageLoadArray = []; var csvContents = ""; fs = require('fs'); if (system.args.length === 1) { console.log('Usage: loadspeed.js <some

CasperJS/ Javascript Selecting Multiple Options

微笑、不失礼 提交于 2019-12-17 21:31:08
问题 Trying to scrape a website, where this is the generic HTML code <select id="xxx" multiple name="zzz"> <option value="123">xaxaxa</option> <option value="124">zazaza</option> <option value="125">ajajaj</option> <option value="126">azzzsa</option> </select> It is not enclosed by a form so I tried using the fill() function that casperjs provides but that did not work. For single entries, I would usually casper.click() and that would work but this does not work for multiple entries even with

Scraping multiple URLs by looping in PhantomJS

五迷三道 提交于 2019-12-17 20:30:02
问题 I am using PhantomJS to scrape some websites and therefore extract information with r. I am following this tutorial. Everything works fine for a single page, but I couldn't find any simple tutorial on how to automate for multiple pages. My experiments so far: var countries = [ "Albania" ,"Afghanistan"]; var len = countries.length; var name1 = ".html"; var add1 = "http://www.kluwerarbitration.com/CommonUI/BITs.aspx?country="; var country =""; var name =""; var add=""; for (i=1; i <= len; i++){