scrape

Scrape with xmlhttp

╄→尐↘猪︶ㄣ 提交于 2021-02-20 03:50:41
问题 I would like to get data from https://www.goaloong.net/football/6in1 This page contains a table. I tried with: Sub REQUESTXML() Dim XMLHttpRequest As xmlHttp Dim HTMLDoc As New HTMLDocument Dim elem As Object Dim x As Long Set XMLHttpRequest = New MSXML2.xmlHttp XMLHttpRequest.Open "GET", "https://www.goaloong.net/football/6in1", False XMLHttpRequest.send While XMLHttpRequest.readyState = 200 DoEvents Wend Debug.Print XMLHttpRequest.responseText HTMLDoc.Body.innerHTML = XMLHttpRequest

Scrape with xmlhttp

老子叫甜甜 提交于 2021-02-20 03:50:21
问题 I would like to get data from https://www.goaloong.net/football/6in1 This page contains a table. I tried with: Sub REQUESTXML() Dim XMLHttpRequest As xmlHttp Dim HTMLDoc As New HTMLDocument Dim elem As Object Dim x As Long Set XMLHttpRequest = New MSXML2.xmlHttp XMLHttpRequest.Open "GET", "https://www.goaloong.net/football/6in1", False XMLHttpRequest.send While XMLHttpRequest.readyState = 200 DoEvents Wend Debug.Print XMLHttpRequest.responseText HTMLDoc.Body.innerHTML = XMLHttpRequest

Web scraping an “onclick” object table on a website with python

随声附和 提交于 2021-02-15 07:44:51
问题 I am trying to scrape the data for this link: page. If you click the up arrow you will notice the highlighted days in the month sections. Clicking on a highlighted day, a table with initiated tenders for that day will appear. All I need to do is get the data in each table for each highlighted day in the calendar. There might be one or more tenders (up to max of 7) per day. Table appears on click I have done some web scraping with bs4, however I think that this is a job for selenium (please,

Python - previous list elements being overwritten by new elements during while loop

我只是一个虾纸丫 提交于 2021-02-05 12:31:07
问题 Hello I am new to Python and am trying to figure out why my list overwrites the previous elements every time a new page is loaded and scraped during the while loop. Thank you in advance. def scrapeurls(): domain = "https://domain234dd.com" count = 0 while count < 10: page = requests.get("{}{}".format(domain, count)) soup = BeautifulSoup(page.content, 'html.parser') data = soup.findAll('div', attrs={'class': 'video'}) urls = [] for div in data: links = div.findAll('a') for a in links: urls

Python - previous list elements being overwritten by new elements during while loop

ぃ、小莉子 提交于 2021-02-05 12:30:41
问题 Hello I am new to Python and am trying to figure out why my list overwrites the previous elements every time a new page is loaded and scraped during the while loop. Thank you in advance. def scrapeurls(): domain = "https://domain234dd.com" count = 0 while count < 10: page = requests.get("{}{}".format(domain, count)) soup = BeautifulSoup(page.content, 'html.parser') data = soup.findAll('div', attrs={'class': 'video'}) urls = [] for div in data: links = div.findAll('a') for a in links: urls

Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?

怎甘沉沦 提交于 2021-01-28 09:07:52
问题 Ok guys, so I'm very much a beginner here. The purpose of what I'm trying to do is to scrape a website for company names and corresponding phone numbers. The end goal would be to write these to a CSV that can be opened with Excel. Currently I'm able to retrieve the company names, and the phone numbers, separately. I am thinking that i could merge the two lists somehow, but I'm concerned about a single outlier data offsetting the whole merge, and mismatching the numbers to names. What is the

How to scrape links from Wikipedia with Python

假如想象 提交于 2021-01-28 07:23:39
问题 I am trying to scrape all the Links to battles from the "List of Naval Battles" on Wikipedia using python. The trouble is that I cannot figure out how to export all of the links containing the words "/wiki/Battle" to my CSV file. I am used to C++, so python is kind of foreign to me. Any ideas? Here is what I have so far... from bs4 import BeautifulSoup import urllib2 rootUrl = "https://en.wikipedia.org/wiki/List_of_naval_battles" def get_soup(url,header): return BeautifulSoup( urllib2.urlopen

scraping table with python based on dates

旧城冷巷雨未停 提交于 2020-12-27 05:58:46
问题 since a week ago i have been trying to scrape a table from this site https://www.bi.go.id/id/moneter/informasi-kurs/transaksi-bi/Default.aspx but i dont have an idea what to write,i am very confused. iam trying to scrape table of kurs transaction from 2015-2020(20 nov 2015-20 nov 2020, but the problem is the link between the default date and the date that I chose is still the same.please help me in any way,Thank you before ! import requests from bs4 import BeautifulSoup import pandas as pd

Facebook Object Debugger - Could not resolve the hostname into a valid IP address

江枫思渺然 提交于 2020-08-01 05:06:26
问题 There is a problem with how Facebook scrapes my page for meta data. When I use the Facebook object debugger I get the following error: I am quite sure this has something to do with how my DNS records are defined. It seems the scraper can't even reach my site. As the error states, it can't turn the host name to a valid IP. When I press the link down the page "See exactly what out scra...", I get "Document returned no data". I am trying to figure it for about a month now and getting VERY VERY

Facebook Object Debugger - Could not resolve the hostname into a valid IP address

梦想与她 提交于 2020-08-01 05:05:46
问题 There is a problem with how Facebook scrapes my page for meta data. When I use the Facebook object debugger I get the following error: I am quite sure this has something to do with how my DNS records are defined. It seems the scraper can't even reach my site. As the error states, it can't turn the host name to a valid IP. When I press the link down the page "See exactly what out scra...", I get "Document returned no data". I am trying to figure it for about a month now and getting VERY VERY