web-scraping

Not able to do webscrapping using beautifulsoup and requests

夙愿已清 提交于 2021-01-29 19:02:31
问题 I am trying to scrape the first two sections values i.e 1*2 and DOUBLECHANCE sections values using bs4 and requests from this website https://web.bet9ja.com/Sport/SubEventDetail?SubEventID=76512106 The code which I written is: import bs4 as bs import urllib.request source = urllib.request.urlopen('https://web.bet9ja.com/Sport/SubEventDetail?SubEventID=76512106') soup = bs.BeautifulSoup(source,'lxml') for div in soup.find_all('div', class_='SEItem ng-scope'): print(div.text) when I run I am

Webscrap VBA - List

余生长醉 提交于 2021-01-29 18:31:17
问题 I am trying to set up a webscrapping VBA code to import data into Excel from this website: https://www.thewindpower.net/windfarms_list_en.php I wish to launch this webpage, select a country and then scrap the data from the table below (including url from the name column). Yet, I am stuck with several points: How can I select the country I wish in VBA code ? How can I select the table as there is no id or class in the tag ? How can I import the URL included in the name column ? Here is the

How to scrape all Steam id, review content, profile_url from reviews of a game in steam into excel file using python?

≡放荡痞女 提交于 2021-01-29 18:04:32
问题 #The error is either it prints only first 11 reviews (when while n<500 is used) or does not print at all(when while True: is used). Requirement is to save all Steam id, review content, profile_url from reviews of the game into excel. from msedge.selenium_tools import Edge, EdgeOptions from selenium.webdriver.common.keys import Keys import re from time import sleep from datetime import datetime from openpyxl import Workbook game_id= 1097150 url = 'https://steamcommunity.com/app/1097150

How to write code to read output file to figure out how far it got in scraping website and then starting from where it left off

纵然是瞬间 提交于 2021-01-29 17:39:43
问题 I'm writing a program to scrape article title, date and body text from each article on this website's archive and export to a csv file. The website seems to block me at some point and I get this error: HTTPError: Service Unavailable. I believe this is because I am trying to access their website too many times in a short amount of time. I want my code to be able to read where the error happened and pick up where it left off. I've tried adding delays to delay 2 seconds after going through 10

Web scraping without specified name, id, or class attached to the data

一笑奈何 提交于 2021-01-29 17:27:24
问题 I am trying to track the status of shipping delivery and display it on an Excel tab. This website https://webcsw.ocs.co.jp/csw/ECSWG0201R00003P.do, displays data when the "Air wayBill No." is entered. I managed to open Internet Explorer, enter the Air WayBill number, then click the search button. Dim IE As Object Set IE = CreateObject("InternetExplorer.Application") IE.Navigate "https://webcsw.ocs.co.jp/csw/ECSWG0201R00000P.do" IE.Visible = True While IE.busy DoEvents Wend Set document = IE

Pyppeteer how to login on page with type

百般思念 提交于 2021-01-29 17:17:57
问题 I was using selenium + chrome driver for my python telegram bot deployed on linux server with docker. Everything is working, but its not supporting async so my app can't do anything else during scrapping. I heard about Pyppeteer, but having some troubles with getting page i need to scrape. Webpage requires me to login. Here are steps: Open page. Click on auth button: <button class="btn btn-outline-warning kt-font-dark mr-2" type="button" id="btn_auth"> <i class="fa fa-key"></i> Enter </button

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host - getting this error

馋奶兔 提交于 2021-01-29 16:35:54
问题 ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host I am getting this error while reading a webpage in the following code from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup myurl = 'https://www.amazon.in/s?k=graphics+card&ref=nb_sb_noss_2' uClient =uReq(myurl) 回答1: passing a useragent header seems to solve the issue. try something like this: from urllib.request import urlopen as uReq, Request from bs4 import

How to use excel vba to click on interactive dialog pop-up?

拥有回忆 提交于 2021-01-29 16:26:11
问题 I am trying to use excel vba to navigate and export data from this website. I am able to click on 2018, 2019 buttons and the setting option, but unable to click the 'export data' option with vba. I attach my code below for your reference. Option Explicit Sub GetURLOfFrame() Dim IE As New SHDocVw.InternetExplorer Dim webpage As MSHTML.HTMLDocument IE.Visible = True IE.navigate "https://www.epa.gov/fuels-registration-reporting-and-compliance-help/rin-trades-and-price-information" Dim t As Date,

Excel VBA - Access Website, generate report & press save on IE dialog bar

放肆的年华 提交于 2021-01-29 16:13:09
问题 I have a question regarding a topic that is already discussed in some other threads and forums but I do not manage to make it work for me. So I came here to ask that questions concerning my individual code. Basically, I access an intranet-site and based on some input (via checkboxes) a report is created with data from SAP. My problem arises after the report is generated and IE prompts me to press the "save" button on its dialog box. I do not manage to automate that part. Could you help me

Python Webscraping beautifulsoup avoid repetition in find_all()

坚强是说给别人听的谎言 提交于 2021-01-29 15:51:48
问题 I am working on web scraping in Python using beautifulsoup. I am trying to extract text in bold or italics or both. Consider the following HTML snippet. <div> <b> <i> HelloWorld </i> </b> </div> If I use the command sp.find_all(['i', 'b']) , understandably, I get two results, one corresponding to bold and the other to italics. i.e. ['< b>< i>HelloWorld< /i>< /b>', '< i>HelloWorld< /i>'] My question is, is there a way to uniquely extract it and get the tags?. My desired output is something