cutycapt

linux多线程网页截图-shell

我与影子孤独终老i 提交于 2020-05-08 09:20:46
老大临时安排任务,要定期对大量(万为单位)的网站截图并归档保存。最早使用的是window下的一个解决方案:webshotcmd.exe+批处理。 运行了一段时间,发现经常弹出webshotcmd.exe程序未响应的窗口,需要人工点击确定后才能继续截图,而且webshocmd未注册版本截出的 图是灰色的,不方便查看。 google了下,发现了linux的两个截图工具cutycapt和phantomjs,经过测试,cutycapt截图较慢,但比较稳定,phantomjs截图速度较快,但时有出现进程假死的情况。权衡利弊,决定使用cutycapt+shell脚本的方式截图: webshot.sh #/bin/bash #webhsot #by caishzh 2013 WEBSHOTDIR="/data/webshot" mkdir -p $WEBSHOTDIR while read LINE do DISPLAY=:0 cutycapt --url=http://$LINE --max-wait=90000 --out=$WEBSHOTDIR/$LINE.jpg >/dev/null 2>&1 done<domain.txt 脚本很简单,就不注释了,domain.txt是网址列表。 cutycapt的安装和使用参照这里 。 执行脚本,可以正常截图,图片质量也很高。但另一个问题出现了

CutyCapt issue with SSL URL

偶尔善良 提交于 2020-01-03 01:28:07
问题 I am having a problem getting CutyCapt to work with SSL URLs, I have the most recent version of CutyCapt ( CutyCapt.cpp 10 2013-07-14 21:57:37Z ), it works perfect with all non-SSL URLs, when I try to grab a URL with SSL using the following command: ./xvfb-run ./CutyCapt --min-width=1280 --min-height=720 --max-wait=6000 \ --url="https://apple.com" --out="testssl.jpg" I get the following error: QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setRenderHint: Painter must be

Debugging CutyCapt + Flash

痞子三分冷 提交于 2019-12-10 17:13:14
问题 I have a system on Ubuntu 12.04 that uses xvfb, CutyCapt, and Adobe Flash to capture a screenshot of an HTML page with embedded Flash. All packages are the Ubuntu 12.04 release packages (nothing custom compiled). xvfb-run --server-args="-screen 0, 1024x768x24" cutycapt --url=http://www.270towin.com/2012_election_predictions.php?mapid=mFh --plugins=on --delay=10 --out=test.png The setup works just fine for capturing Flash. The problem I am having is that the Flash object makes some remote data

CutyCapt issue with SSL URL

僤鯓⒐⒋嵵緔 提交于 2019-12-07 00:35:33
I am having a problem getting CutyCapt to work with SSL URLs, I have the most recent version of CutyCapt ( CutyCapt.cpp 10 2013-07-14 21:57:37Z ), it works perfect with all non-SSL URLs, when I try to grab a URL with SSL using the following command: ./xvfb-run ./CutyCapt --min-width=1280 --min-height=720 --max-wait=6000 \ --url="https://apple.com" --out="testssl.jpg" I get the following error: QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setRenderHint: Painter must be active to set rendering hints QPainter::setBrush: Painter not active QPainter::pen: Painter not active

Convert HTML to an image

﹥>﹥吖頭↗ 提交于 2019-12-05 04:11:30
问题 Duplicate What is the best way to create a web page thumbnail? I want to display a thumbnail image of an HTML page. How can I do this? 回答1: You might want to see if CutyCapt or IECapt are a good fit for your needs. 回答2: Provide more information like if you are using PHP, ASP.net or a windows Application. If you are using PHP for example you can profit of the variety of Open Source extensions that can do the work for you. 回答3: See how to Generate a Thumbnail of Any Webpage using PHP. Also have

Convert HTML to an image

陌路散爱 提交于 2019-12-03 22:11:41
Duplicate What is the best way to create a web page thumbnail? I want to display a thumbnail image of an HTML page. How can I do this? You might want to see if CutyCapt or IECapt are a good fit for your needs. Provide more information like if you are using PHP, ASP.net or a windows Application. If you are using PHP for example you can profit of the variety of Open Source extensions that can do the work for you. See how to Generate a Thumbnail of Any Webpage using PHP . Also have a look at WebThumb . I found several web sites that offer a service that will do this for you using http://www

linux多线程网页截图-python

让人想犯罪 __ 提交于 2019-11-30 00:44:47
上一篇中( linux多线程网页截图-shell ),使用shell多进程对大量的网站截图,大大减少了截图的时间。但慢慢的也发现了这种方式的弊端:每个进程分配的网站数量是相等的,有些进程截图较快,有些较慢,个别进程在其它进程已经截图完成后,还要运行10多个小时才能把分配的网站截图完。 如何把现有的“平均分配”截图方式改成“能者多劳”呢? 刚好最近在学习python,而python可以很方便的支持多线程。找了些资料,使用threading+queue的方式实现了“能者多劳”的多线程截图方式: #coding:utf-8 import threading,urllib2 import datetime,time import Queue import os class Webshot(threading.Thread): def __init__(self,queue): threading.Thread.__init__(self) self.queue=queue def run(self): while True: #如果队列为空,则退出,否则从队列中取出一条网址数据,并截图 if self.queue.empty(): break host=self.queue.get().strip('\n') shotcmd="DISPLAY=:0 cutycapt --url=http:

Programmatically get a screenshot of a page

徘徊边缘 提交于 2019-11-26 17:05:18
I'm writing a specialized crawler and parser for internal use, and I require the ability to take a screenshot of a web page in order to check what colours are being used throughout. The program will take in around ten web addresses and will save them as a bitmap image. From there I plan to use LockBits in order to create a list of the five most used colours within the image. To my knowledge, it's the easiest way to get the colours used within a web page, but if there is an easier way to do it please chime in with your suggestions. Anyway, I was going to use ACA WebThumb ActiveX Control until I

Programmatically get a screenshot of a page

别等时光非礼了梦想. 提交于 2019-11-26 05:00:38
问题 I\'m writing a specialized crawler and parser for internal use, and I require the ability to take a screenshot of a web page in order to check what colours are being used throughout. The program will take in around ten web addresses and will save them as a bitmap image. From there I plan to use LockBits in order to create a list of the five most used colours within the image. To my knowledge, it\'s the easiest way to get the colours used within a web page, but if there is an easier way to do