screen-scraping | 易学教程

VBA to login in zerodha account and then download and again upload live data for buy and sell signal

阅读更多关于 VBA to login in zerodha account and then download and again upload live data for buy and sell signal

问题 I want my VBA to: Login into Zerodha kite account. Integrate API. Then download the live data from there and after analysing the data, it should upload data for buy or sell option. I tried to log on to Zerodha account, but it is just refusing my request, so I can't do anything. Sub Test() Set ie = CreateObject("InternetExplorer.application") ie.Visible = True ie.Navigate ("https://https://kite.zerodha.com/" & ActiveCell) Do If ie.ReadyState = 4 Then ie.Visible = False Exit Do Else DoEvents

python scraping reuters site…bad xpath?

阅读更多关于 python scraping reuters site…bad xpath?

问题 I am trying to do something which appeared to be simple...I am trying to scrape company names of reuters list from this link: http://www.reuters.com/finance/markets/index?symbol=us!spx&sortBy=&sortDir=&pn= however, I just can't access the company names! Really, after playing around with a lot of xpath queries, I have problems accessing the table. I am trying to grab the names such as "3M company" and "Abbott Laboratories" Here are snippets of code I have used: scrape = [] companies =[] import

Retrieve data from the first td in every tr

阅读更多关于 Retrieve data from the first td in every tr

问题 I'm scraping a page which contains of a table with several tr's. Inside every tr there's four td's, and I want to get the data from the first of these td's. Below is the code I've tried so far, but it grabs all the td's. How can I accomplish what I want? ... $html = new simple_html_dom(); $html = file_get_html($url); foreach($html->find('table tr') as $row) { foreach($row->find('td', 0) as $cell) { echo $cell; } } 回答1: Think about why you're using the second foreach when you actually only

How to fix Newspaper3k 403 Client Error for certain URL's?

阅读更多关于 How to fix Newspaper3k 403 Client Error for certain URL's?

问题 I am trying to get a list of articles using a combo of the googlesearch and newspaper3k python packages. When using article.parse, I end up getting an error: newspaper.article.ArticleException: Article download() failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/donald-trump-hillary-clinton-2020-rally-orlando-1444697 on URL https://www.newsweek.com/donald-trump-hillary-clinton-2020-rally-orlando-1444697 I have tried running as admin when executing script and the link

Screen Scraping in Python

阅读更多关于 Screen Scraping in Python

问题 I'm new to the whole concept of screen scraping in Python, although I've done a bit of screen scraping in R. I'm trying to scrape the Yelp website. I'm trying to scrape the names of each insurance agency which the yelp search returns. With most scraping tasks, I'm able to perform the following task, but always have a hard time going forward with parsing the xml. import urllib2 from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.yelp.com/search?find_desc

Setting a C# Form to a Negative Location

阅读更多关于 Setting a C# Form to a Negative Location

问题 I am working on a tool for Windows that will interface with CloudApp using its API. I found some articles on here about how to achieve the Region capture which I used to modified to fit my exact needs. Everything is going very well, but I am having trouble with a multi-monitor setup. The reason for the trouble, is that I run one monitor in 1920x1080 and the second is 1080x1920. The overall flow is that I create an image of the entire screen (3000, 1920), then I show it as the background in a

Scrapy + Selenium + Datepicker

阅读更多关于 Scrapy + Selenium + Datepicker

问题 So i need to scrap a page like this for example and i am using Scrapy + Seleninum to interact with the datepicker calendar but i am running into a ElementNotVisibleException: Message: Element is not currently visible and so may not be interacted with . So far i have: def parse(self, response): self.driver.get("https://www.airbnb.pt/rooms/9315238") try: element = WebDriverWait(self.driver, 10).until( EC.presence_of_element_located((By.XPATH, "//input[@name='checkin']")) ) finally: x = self

Python Urllib UrlOpen Read

阅读更多关于 Python Urllib UrlOpen Read

问题 Say I am retrieving a list of Urls from a server using Urllib2 library from Python. I noticed that it took about 5 seconds to get one page and it would take a long time to finish all the pages I want to collect. I am thinking out of those 5 seconds. Most of the time was consumed on the server side and I am wondering could I just start using the threading library. Say 5 threads in this case, then the average time could be dramatically increased. Maybe 1 or 2 seconds in each page. (might make

Arranging coordinates into clockwise order

阅读更多关于 Arranging coordinates into clockwise order

问题 I have 9 screen coordinates, each representing one of 9 positions. From the top right, I want that position to start as the 1st position, and the following clockwise coordinates to represent the 2nd, 3rd, 4th and so on, up until the 9th, which would be the top left coordinate. Would anybody here be able to come up with some sort of mathematical means of determining which of the 9 coordinates is in which position? They're all relative to each other, and will always be THAT relative to each

Download an Entire Website in C#

阅读更多关于 Download an Entire Website in C#

问题 Forgive my ignorance on the subject I am using string p="http://" + Textbox2.text; string r= textBox3.Text; System.Net.WebClient webclient=new System.Net.Webclient(); webclient.DownloadFile(p,r); to download a webpage. Can you please help me with enhancing the code so that it downloads the entire website. Tried using HTML Screen Scraping but it returns me only the href links of the index.html files. How do i proceed ahead Thanks 回答1: Scraping a website is actually a lot of work, with a lot of