headless-browser

Prevent CSS/other resource download in PhantomJS/Selenium driven by Python

六月ゝ 毕业季﹏ 提交于 2019-11-27 01:23:20
问题 I'm trying to speed up Selenium/PhantomJS webscraper in Python by preventing download of CSS/other resources. All I need to download is img src and alt tags. I've found this code: page.onResourceRequested = function(requestData, request) { if ((/http:\/\/.+?\.css/gi).test(requestData['url']) || requestData['Content-Type'] == 'text/css') { console.log('The url of the request is matching. Aborting: ' + requestData['url']); request.abort(); } }; via: How can I control PhantomJS to skip download

Repeating “Asynchronous Sessions cleanup phase starting NOW”

会有一股神秘感。 提交于 2019-11-26 23:36:27
问题 When I run my test suit, I find that intermittently some of the texts will hang for a very long time (15 mins to half an hour) with PhantomJS constantly reporting: Asynchronous Sessions cleanup phase starting NOW Asynchronous Sessions cleanup phase starting NOW Asynchronous Sessions cleanup phase starting NOW Asynchronous Sessions cleanup phase starting NOW Asynchronous Sessions cleanup phase starting NOW Asynchronous Sessions cleanup phase starting NOW Asynchronous Sessions cleanup phase

Why doesn't Node.js have a native DOM?

落爺英雄遲暮 提交于 2019-11-26 22:12:56
When I discovered that Node.js was built using the V8 JavaScript engine , I thought: Great, web scraping will be easier as the page will be rendered like in the browser, with a "native" DOM supporting XPath and any AJAX calls on the page executed. Why doesn't it have a native DOM when it uses the same JavaScript engine as Chrome? Why doesn't it have a mode to run JavaScript in retrieved pages? What am I not understanding about JavaScript engines vs the engine in a web browser? Many thanks! The DOM is the DOM, and the JavaScript implementation is simply a separate entity. The DOM represents a

Python Headless Browser for GAE

一笑奈何 提交于 2019-11-26 20:45:17
问题 I'm trying to use Angular.js client-side with webapp2 on Google Appengine. In order to solve the SEO issues the idea was to use a headless browser to run the javascript server-side and serve the resulting html to the crawlers. Is there any headless browser for python that runs on google app engine? 回答1: This can now be done on App Engine Flex with a custom runtime, so I'm adding this answer since this question is the first thing to popup in google. I based this custom runtime off of my other

Limit chrome headless CPU and memory usage

倾然丶 夕夏残阳落幕 提交于 2019-11-26 19:05:49
I am using selenium to run chrome headless with the following command: system "LC_ALL=C google-chrome --headless --enable-logging --hide-scrollbars --remote-debugging-port=#{debug_port} --remote-debugging-address=0.0.0.0 --disable-gpu --no-sandbox --ignore-certificate-errors &" However it appears that chrome headless is consuming too much memory and cpu,anyone know how we can limit CPU/Memory usage of chrome headless? Or if there is some workaround. Thanks in advance. There had been a lot of discussion going around about the unpredictable CPU and Memory Consumption by Chrome Headless sessions.

Which drivers support “no-browser”/“headless” testing?

和自甴很熟 提交于 2019-11-26 18:38:29
问题 Actually I want to run my selenium code on server where dont want open the any browser. But I am confused which webdriver is use for the server which do all task (where I download some files from some site and store in my server). 回答1: To execute your Test Suite through Selenium without opening any browser you can use any of the Browser Client from the following list : Headless Chrome : Here you can find an working example. Headless Firefox : Here you can find an working example. PhantomJS :

Headless browser with full javascript support for java

半世苍凉 提交于 2019-11-26 18:17:56
问题 I have been using HtmlUnit (the developers did a great job) as an headless browser for some of my previous applications but the javascript support isn't working for some website that my next application will be accessing. I heard about QtWebKit binding for Python but my application will be in Java or is there a Java binding for WebKit or QtWebKit? Does anyone know a good headless browser for Java with full javascript support? 回答1: Nathan Ridley's answer to another similar question is the most

Headless, scriptable Firefox/Webkit on linux? [closed]

 ̄綄美尐妖づ 提交于 2019-11-26 15:15:52
问题 I'm looking to automate some web interactions, namely periodic download of files from a secure website. This basically involves entering my username/password and navigating to the appropriate URL. I tried simple scripting in Python, followed by more sophisticated scripting, only to discover this particular website is using some obnoxious javascript and flash based mechanism for login, rendering my methods useless. I then tried HTMLUnit, but that doesn't seem to want to work either. I suspect

PHP Headless Browser? [closed]

核能气质少年 提交于 2019-11-26 11:18:35
问题 Is there a headless browser library for PHP? Would like something that has a JS engine built into it. FOSS preferred. 回答1: PhantomJS - http://phantomjs.org/ PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG. You can couple it with something like php-PhantomjsRunner (now deprecated) if you want or bake your own . When setup and ready to start testing with PhantomJS, pick out one of the

Headless Browser for Python (Javascript support REQUIRED!) [closed]

时光总嘲笑我的痴心妄想 提交于 2019-11-26 10:15:58
I need a headless browser which is fairly easy to use (I am still fairly new to Python and programming in general) which will allow me to navigate to a page, log into a form that requires Javascript, and then scrape the resulting web page by searching for results matching certain criteria, clicking check boxes, and clicking to download files. All of this requires Javascript. I hear a headless browser is what I want - requirements/preferences are that I be able to run it from Python, and preferably that the resultant script will be compilable by py2exe (I am writing this program for other users