phantomjs

How to download a csv file using PhantomJS

大城市里の小女人 提交于 2019-12-17 09:51:53
问题 When I'm browsing a website A using normal browser (Chrome) and when I click on a link on the website A, Chrome imediatelly downloads report in a form of CSV file. When I checked a server response headers I get the following results: Cache-Control:private,max-age=31536000 Connection:Keep-Alive Content-Disposition:attachment; filename="report.csv" Content-Encoding:gzip Content-Language:de-DE Content-Type:text/csv; charset=UTF-8 Date:Wed, 22 Jul 2015 12:44:30 GMT Expires:Thu, 21 Jul 2016 12:44

How to download a csv file using PhantomJS

巧了我就是萌 提交于 2019-12-17 09:51:47
问题 When I'm browsing a website A using normal browser (Chrome) and when I click on a link on the website A, Chrome imediatelly downloads report in a form of CSV file. When I checked a server response headers I get the following results: Cache-Control:private,max-age=31536000 Connection:Keep-Alive Content-Disposition:attachment; filename="report.csv" Content-Encoding:gzip Content-Language:de-DE Content-Type:text/csv; charset=UTF-8 Date:Wed, 22 Jul 2015 12:44:30 GMT Expires:Thu, 21 Jul 2016 12:44

Understanding the evaluate function in CasperJS

扶醉桌前 提交于 2019-12-17 09:45:29
问题 I want to understand in which case I should or have to use the evaluate function. I have read the API doc about the evaluate function of CasperJS, but I'm unsure in which case I should use this function. And what does DOM context mean? Can somebody provide an example? 回答1: The CasperJS documentation has a pretty good description of what casper.evaluate() does. To recap: You pass a function that will be executed in the DOM context (you can also call it the page context). You can pass some

save html output of page after execution of the page's javascript

回眸只為那壹抹淺笑 提交于 2019-12-17 07:10:15
问题 There is a site I am trying to scrape, that first loads an html/js modifies the form input fields using js and then POSTs. How can I get the final html output of the POSTed page? I tried to do this with phantomjs, but it seems to only have an option to render image files. Googling around suggests it should be possible , but I can't figure out how. My attempt: var page = require('webpage').create(); var fs = require('fs'); page.open('https://www.somesite.com/page.aspx', function () { page

Phantomjs page.content isn't retrieving the page content

被刻印的时光 ゝ 提交于 2019-12-17 06:56:28
问题 I use Phantomjs to scrape websites that use JavaScript and Ajax to load dynamic content. I have the following code: var page = require('webpage').create(); page.onError = function(msg, trace) { var msgStack = ['ERROR: ' + msg]; if (trace && trace.length) { msgStack.push('TRACE:'); trace.forEach(function(t) { msgStack.push(' -> ' + t.file + ': ' + t.line + (t.function ? ' (in function "' + t.function +'")' : '')); }); } console.error(msgStack.join('\n')); }; page.onConsoleMessage = function

How to scroll down with Phantomjs to load dynamic content

老子叫甜甜 提交于 2019-12-17 03:53:28
问题 I am trying to scrape links from a page that generates content dynamically as the user scroll down to the bottom (infinite scrolling). I have tried doing different things with Phantomjs but not able to gather links beyond first page. Let say the element at the bottom which loads content has class .has-more-items . It is available until final content is loaded while scrolling and then becomes unavailable in DOM (display:none). Here are the things I have tried- Setting viewportSize to a large

PhantomJS web driver stays in memory

扶醉桌前 提交于 2019-12-17 02:51:36
问题 I am instantiating the PhantomJSDriver in C# with this code: Driver = new PhantomJSDriver(); And cleaning it up with this: Driver.Dispose(); Driver = null; Should the process exit or stay in memory? If it is supposed to stay in memory, visible in the Windows 7 task manager, can I kill it programmatically? Should I? 回答1: Answering straight, Driver.Dispose(); shouldn't be used to clean up the WebDriver instance. For a proper cleanup we must be using Driver.Quit(); . Driver.Dispose(); : I think

Python+Selenium+Phantomjs数据抓取环境配置实践

浪尽此生 提交于 2019-12-16 11:03:14
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 以后估计要做数据挖掘方面的项目,现在却连数据都不会爬,这怎么能行? 先在知乎上面看到有高手说selenium可以用来爬数据,然后进一步找到了 “ 数据抓取的艺术(一):Selenium+Phantomjs数据抓取环境配置 ”,照着做却碰到了问题。 Python用的是 ActivePython,里面已经帮你把easy_install 和 pip 都装好了,第一步Python的安装没有问题。 但是第二步使用 pip install selenium 这一步却怎么也过不去了,可能是校园网的问题,所以下载老是出问题。 上官网下了selenium-2.33.0.tar.gz,解压缩以后却不知道放哪。。。 小白略感无奈,在网上找了半天,终于找到了该怎么办: 第一种办法是,把解压目录下的selenium文件夹(selenium-2.33.0\py\selenium)拷到 Python 安装目录下的 Lib\site-packages下就好了 第二种办法是转到解压后的目录下运行setup.py 文件,命令是 python setup.py install。然后命令行黑框就会输出一大堆信息,结束以后,解压的目录下会多出两个文件夹,另外 Lib\site-packages 下也会多出一个 selenium-2.33.0-py2.7

npm electron 文档

元气小坏坏 提交于 2019-12-15 19:50:55
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> electron 安装 安装命令代理设置(C:\Users\colin.npmrc) home=https://npm.taobao.org registry=https://registry.npm.taobao.org/ sass_binary_site=https://npm.taobao.org/mirrors/node-sass/ phantomjs_cdnurl=http://npm.taobao.org/mirrors/phantomjs electron_mirror=http://npm.taobao.org/mirrors/electron/ 安装命令 npm install electron -g 来源: oschina 链接: https://my.oschina.net/colin86/blog/3143298

解决使用selenium+PhantomJs抓取数据导致内存溢出问题)

浪尽此生 提交于 2019-12-15 04:39:16
解决使用selenium+PhantomJs抓取数据导致内存溢出问题 在使用selenium+PhantomJs爬取数据时,发现系统运行缓慢,后台一查发现很多phantomJs进程未被关闭。 在java代码中,我们调用driver.close()方法并不能保证phantomJs进程会被kill掉。 我使用了比较笨的办法,就是写一个.sh脚本,然后java执行该脚本来杀死这些进程以腾出内存空间的目的。 try { Runtime . getRuntime ( ) . exec ( “脚本位置” ) ; } catch ( IOException e ) { log . error ( e . getMessage ( ) , e ) ; } .sh 脚本代码如下: # ! / bin / bash #defined ps - ef | grep phantomjs | grep - v grep | cut - c 9 - 15 | xargs kill - s 9 这个脚本会杀死所有包含phantomjs 关键字的进程。 测试效果,内存得到有效的释放,当然了,也许会有更好的办法,但目前我只能想到这个方案, 如果你有更好的方案,欢迎留言。 来源: CSDN 作者: 凌飞安 链接: https://blog.csdn.net/lingfeian/article/details