screen-scraping | 易学教程

Memory Leak using GetPixel/GetDC in Visual Basic

阅读更多关于 Memory Leak using GetPixel/GetDC in Visual Basic

问题 I have a timer that among other things, checks 5 spots on the screen for a color change. My program monitors a phone system app and checks to see if there is a new incoming phone call from any of 5 buttons. I'm using the following code based on another question I had posted. Monitor an area of the screen for a certain color in Visual Basic Private Function CheckforCall() Try Dim queue1 As Integer = GetPixel(GetDC(0), 40, 573) Dim queue2 As Integer = GetPixel(GetDC(0), 140, 573) Dim queue3 As

Extracting text from Microsoft Word files in Python with Scrapy

阅读更多关于 Extracting text from Microsoft Word files in Python with Scrapy

问题 Here is my sample code with Scrapy code with Python to extract word.doc and a .docx file extract from a website. import StringIO from functools import partial from scrapy.http import Request from scrapy.spider import BaseSpider from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from pyPdf import PdfFileReader from scrapy.selector import HtmlXPathSelector from scrapy.spider import Spider from scrapy.selector import Selector

Injecting javascript code into an on click event with javascript and casper.js

阅读更多关于 Injecting javascript code into an on click event with javascript and casper.js

问题 I've just started using casperjs after trying to use python (selenium / requests and mechanise) to scrape a page only after some javascript loaded some dynamic content on the page. Since this was very hard to do or very slow with selenium it was suggested I turn to Casper js (which requires phantomjs). One thing I am wondering (I am quite new to javascript) is relating to a javascript onclick event. The page I want to scrape by default shows ten names per page, and at the bottom has options

Clicking image with mechanize

阅读更多关于 Clicking image with mechanize

问题 clicking a text agent.click(page.link_with(:text => 'some_text') with mechainze is piece of cake. How to click an image with mechanize? 回答1: It is rather similar. You just need to grab one of the attributes of your image. have a look below..: agent.click(page.image_with(:alt=> 'your image') 回答2: Clicking on a pure HTML image will typically have no effect. If the image has an onclick handler, you will not be able to click on it with Mechanize as it does not support javascript. You may want to

Grabbed data from a given URL and put it into a file using scrapy

阅读更多关于 Grabbed data from a given URL and put it into a file using scrapy

问题 I am trying to scrapped deeply a given web site and grab text from all over pages. I am using scrapy to scrap web site here is how i am running spider scrapy crawl stack_crawler -o items.json item.json file coming empty Here is spider code_snap # -*- coding: utf-8 -*- import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule #from tutorial.items import TutorialItem from tutorial.items import DmozItem class StackCrawlerSpider(CrawlSpider): name

Scraping data from website that uses javascript

阅读更多关于 Scraping data from website that uses javascript

问题 I'm currently working on a program that checks University class availability but the website uses javascript to display classes and their times. Using Java, I'm working on scraping this data and using it to tell users when classes are open. I've tried using Selenium but I do not really know how to use it very well. Is there an easier way to do this? 回答1: Without specific is hard to know. But I assume that if the data is not in the page at load time they may be using AJAX to load it. As I said

How to scrape ID using Python BeautifulSoup

阅读更多关于 How to scrape ID using Python BeautifulSoup

问题 I would like to scrape the div class = size along with 'ID' value using BeautifulSoup in Python. <div class="size "> <a class="selectVar" id="23333" data="40593232" data-price="13000,00 €" data-tprice="" data-sh="107-42" data-size-original="92" data-eu="92" data-size-uk="5" data-size-us="5.5" data-size-cm="26.5" data-branch-2="1" data-branch-3="1" data-branch-4="1" data-branch-5="1" data-branch-6="1" data-branch-on="1"> 92 </a> </div> I Have tried the following with no success: product = soup

Python parsing: lxml to get just part of a tag's text

阅读更多关于 Python parsing: lxml to get just part of a tag's text

问题 I'm working in Python with HTML that looks like this. I'm parsing with lxml, but could equally happily use pyquery: NameDave Davies Address123 Greyfriars Road, London Pulling out 'Name' and 'Address' is dead easy, whatever library I use, but how do I get the remainder of the text - i.e. 'Dave Davies'? 回答1: Each Element can have a text and a tail attribute (in the link, search for the word "tail"): import lxml.etree content=''

Monitor an area of the screen for a certain color in Visual Basic

阅读更多关于 Monitor an area of the screen for a certain color in Visual Basic

问题 I'm designing a player application to accompany our phone system. As our calltakers take calls, it makes recordings of each call. They can go to a list module, find a recording and double click, which opens my player. The issue i have is that if the calltaker gets another call, my player doesn't know it and will continue playing. I'm looking for a way to monitor the screen in a particular area and when it sees yellow or red instead of blue, it will pause my player. The phone system does not

python scraping date from html page (June 10, 2017)

阅读更多关于 python scraping date from html page (June 10, 2017)

问题 How can I extract date "June 03,2017" from html page having below table data. The date will change as per the order number. I am not sure if i am using it correctly. please advise. <tr> <td style="font:bold 24px Arial;">Order #12345</td> <td style="font:13px Arial;">Order Date: June 03, 2017</td> </tr> Below is the sample code which i have written import requests from bs4 import BeautifulSoup #'url' is the actual link of html page data = requests.get('url').content soup =