scrapy

Spider closed (finished) 没完成

眉间皱痕 提交于 2020-12-30 16:53:58
2020-12-29 01:45:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 309977, 'downloader/request_count': 609, 'downloader/request_method_count/GET': 609, 'downloader/response_bytes': 1549878, 'downloader/response_count': 609, 'downloader/response_status_count/200': 333, 'downloader/response_status_count/302': 276, 'elapsed_time_seconds': 303.197538, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2020, 12, 28, 17, 45, 47, 574530), 'item_scraped_count': 275, 'log_count/INFO': 13, 'request_depth_max': 58, 'response_received_count': 609, 'scheduler

Work-horse process was terminated unexpectedly RQ and Scrapy

对着背影说爱祢 提交于 2020-12-30 03:14:43
问题 I am trying to retrieve a function from redis (rq), which generate a CrawlerProcess but i'm getting Work-horse process was terminated unexpectedly (waitpid returned 11) console log: Moving job to 'failed' queue (work-horse terminated unexpectedly; waitpid returned 11) on the line I marked with comment THIS LINE KILL THE PROGRAM What am I doing wrong? How I can fix it? This function I retrieve well from RQ: def custom_executor(url): process = CrawlerProcess({ 'USER_AGENT': "Mozilla/5.0 (X11;

Work-horse process was terminated unexpectedly RQ and Scrapy

混江龙づ霸主 提交于 2020-12-30 03:13:59
问题 I am trying to retrieve a function from redis (rq), which generate a CrawlerProcess but i'm getting Work-horse process was terminated unexpectedly (waitpid returned 11) console log: Moving job to 'failed' queue (work-horse terminated unexpectedly; waitpid returned 11) on the line I marked with comment THIS LINE KILL THE PROGRAM What am I doing wrong? How I can fix it? This function I retrieve well from RQ: def custom_executor(url): process = CrawlerProcess({ 'USER_AGENT': "Mozilla/5.0 (X11;

Work-horse process was terminated unexpectedly RQ and Scrapy

天涯浪子 提交于 2020-12-30 03:12:24
问题 I am trying to retrieve a function from redis (rq), which generate a CrawlerProcess but i'm getting Work-horse process was terminated unexpectedly (waitpid returned 11) console log: Moving job to 'failed' queue (work-horse terminated unexpectedly; waitpid returned 11) on the line I marked with comment THIS LINE KILL THE PROGRAM What am I doing wrong? How I can fix it? This function I retrieve well from RQ: def custom_executor(url): process = CrawlerProcess({ 'USER_AGENT': "Mozilla/5.0 (X11;

Work-horse process was terminated unexpectedly RQ and Scrapy

喜欢而已 提交于 2020-12-30 03:12:00
问题 I am trying to retrieve a function from redis (rq), which generate a CrawlerProcess but i'm getting Work-horse process was terminated unexpectedly (waitpid returned 11) console log: Moving job to 'failed' queue (work-horse terminated unexpectedly; waitpid returned 11) on the line I marked with comment THIS LINE KILL THE PROGRAM What am I doing wrong? How I can fix it? This function I retrieve well from RQ: def custom_executor(url): process = CrawlerProcess({ 'USER_AGENT': "Mozilla/5.0 (X11;

Scrapy: extract JSON from within HTML script

て烟熏妆下的殇ゞ 提交于 2020-12-29 08:19:49
问题 I'm trying to extract (what appears to be) JSON data from within an HTML script. The HTML script looks like this on the site: <script> $(document).ready(function(){ var terms = new Verba.Compare.Collections.Terms([{"id":"6436","name":"SUMMER 16","inquiry":true,"ordering":true},{"id":"6517","name":"FALL 16","inquiry":true,"ordering":true}]); var view = new Verba.Compare.Views.CourseSelector({el: "body", terms: terms}); }); </script> I'd like to pull out the following: [{"id":"6436","name":

Scrapy爬虫入门

霸气de小男生 提交于 2020-12-29 07:54:09
1.安装Scrapy   打开Anaconda Prompt,执行:pip install Scrapy执行安装!   注意:要是安装过程中抛出:   error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools   (或者类似信息)的需要提前安装(根据自己的python版本安装,cp36是指匹配python3.6.x版本,amd64是指64位系统):      下载网站: http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted   完成之后执行安装:      安装成功后再执行:pip install Scrapy执行安装即可! 2.查看scrapy   输入:scrapy,表示安装成功!    3.查看命令   输入:help       4.创建Scrapy项目   执行命令:scrapy startproject bky      这表示创建成功!   执行cd bky, dir命令查看详情:    5.创建spider   查看spiders目录      创建一个新的spider

Scrapy(4)spider 帮助你寻找最美小姐姐

谁说我不能喝 提交于 2020-12-27 10:09:32
我们都知道我们平常想下载一些漂亮的图片来装饰自己的桌面,可是找到了都是需要收费的网站,真的很恼火,所以今天小编,可能要带大家实现这么一个工具,可以用来爬取某个网站的好看的图片 兴不兴奋啊,是的超级兴奋,现在这里透漏一下,以后每天都会同时更新 《今日金融词汇》《每日一道 python 面试题》 ,敬请期待,谢谢关注, 欢迎点赞,关注,收藏三连击 ,只看,不关注,不是好汉,哈哈开玩笑 哈哈,行了我们进入主题吧 附上链接地址 https://image.so.com/ 创建项目前,我们需要来分析下网站数据,进入 首页,点击美女,我们可以知道跳转到这个页面,可以看出数据是通过 jsonp 的形式,进行 ajax 渲染的,而且每一次刷新页面这个函数都会随机变化,也就是说可能写出来的代码是具有时效性的 我们再随机点击一张图片进入看更加详细的页面, 就来到了这个页面,我们 f12 一下,可以看到数据是这样的,具有每张图片的详细信息,点击这个链接,进入 preview https: //image.so.com/zjl?ch=beauty&direction=next&sn=0&pn=30&prevsn=-1 我们可以看到有图片的详细信息了,id,title,imgurl 然后我们再看看 header,里面需要哪些参数,从图上看,我们需要 ch, sn, pn 我们可以拼接出来这样一个链接

Scrapy(4)spider 帮助你寻找最美小姐姐

给你一囗甜甜゛ 提交于 2020-12-27 08:49:50
我们都知道我们平常想下载一些漂亮的图片来装饰自己的桌面,可是找到了都是需要收费的网站,真的很恼火,所以今天小编,可能要带大家实现这么一个工具,可以用来爬取某个网站的好看的图片 兴不兴奋啊,是的超级兴奋,现在这里透漏一下,以后每天都会同时更新 《今日金融词汇》《每日一道 python 面试题》 ,敬请期待,谢谢关注, 欢迎点赞,关注,收藏三连击 ,只看,不关注,不是好汉,哈哈开玩笑 哈哈,行了我们进入主题吧 附上链接地址 https://image.so.com/ 创建项目前,我们需要来分析下网站数据,进入 首页,点击美女,我们可以知道跳转到这个页面,可以看出数据是通过 jsonp 的形式,进行 ajax 渲染的,而且每一次刷新页面这个函数都会随机变化,也就是说可能写出来的代码是具有时效性的 我们再随机点击一张图片进入看更加详细的页面, 就来到了这个页面,我们 f12 一下,可以看到数据是这样的,具有每张图片的详细信息,点击这个链接,进入 preview https://image.so.com/zjl?ch=beauty&direction=next&sn=0&pn=30&prevsn=-1 我们可以看到有图片的详细信息了,id,title,imgurl 然后我们再看看 header,里面需要哪些参数,从图上看,我们需要 ch, sn, pn 我们可以拼接出来这样一个链接

fatal error C1083: Cannot open include file: 'basetsd.h'

你。 提交于 2020-12-26 09:25:17
问题 So i have been trying to install Scrapy for Python for the last couple of days. Trying anything i could think off and read everything i have come across with similar problems, but haven't been able to find a solution. So here is the code. Thank you. building 'twisted.test.raiser' extension creating build\temp.win32-3.6 creating build\temp.win32-3.6\Release creating build\temp.win32-3.6\Release\src creating build\temp.win32-3.6\Release\src\twisted creating build\temp.win32-3.6\Release\src