web-scraping | 易学教程

Not able to do webscrapping using beautifulsoup and requests

阅读更多关于 Not able to do webscrapping using beautifulsoup and requests

问题 I am trying to scrape the first two sections values i.e 1*2 and DOUBLECHANCE sections values using bs4 and requests from this website https://web.bet9ja.com/Sport/SubEventDetail?SubEventID=76512106 The code which I written is: import bs4 as bs import urllib.request source = urllib.request.urlopen('https://web.bet9ja.com/Sport/SubEventDetail?SubEventID=76512106') soup = bs.BeautifulSoup(source,'lxml') for div in soup.find_all('div', class_='SEItem ng-scope'): print(div.text) when I run I am

Webscrap VBA - List

阅读更多关于 Webscrap VBA - List

问题 I am trying to set up a webscrapping VBA code to import data into Excel from this website: https://www.thewindpower.net/windfarms_list_en.php I wish to launch this webpage, select a country and then scrap the data from the table below (including url from the name column). Yet, I am stuck with several points: How can I select the country I wish in VBA code ? How can I select the table as there is no id or class in the tag ? How can I import the URL included in the name column ? Here is the

How to scrape all Steam id, review content, profile_url from reviews of a game in steam into excel file using python?

阅读更多关于 How to scrape all Steam id, review content, profile_url from reviews of a game in steam into excel file using python?

问题 #The error is either it prints only first 11 reviews (when while n<500 is used) or does not print at all(when while True: is used). Requirement is to save all Steam id, review content, profile_url from reviews of the game into excel. from msedge.selenium_tools import Edge, EdgeOptions from selenium.webdriver.common.keys import Keys import re from time import sleep from datetime import datetime from openpyxl import Workbook game_id= 1097150 url = 'https://steamcommunity.com/app/1097150

How to write code to read output file to figure out how far it got in scraping website and then starting from where it left off

阅读更多关于 How to write code to read output file to figure out how far it got in scraping website and then starting from where it left off

问题 I'm writing a program to scrape article title, date and body text from each article on this website's archive and export to a csv file. The website seems to block me at some point and I get this error: HTTPError: Service Unavailable. I believe this is because I am trying to access their website too many times in a short amount of time. I want my code to be able to read where the error happened and pick up where it left off. I've tried adding delays to delay 2 seconds after going through 10

Web scraping without specified name, id, or class attached to the data

阅读更多关于 Web scraping without specified name, id, or class attached to the data

问题 I am trying to track the status of shipping delivery and display it on an Excel tab. This website https://webcsw.ocs.co.jp/csw/ECSWG0201R00003P.do, displays data when the "Air wayBill No." is entered. I managed to open Internet Explorer, enter the Air WayBill number, then click the search button. Dim IE As Object Set IE = CreateObject("InternetExplorer.Application") IE.Navigate "https://webcsw.ocs.co.jp/csw/ECSWG0201R00000P.do" IE.Visible = True While IE.busy DoEvents Wend Set document = IE

Pyppeteer how to login on page with type

阅读更多关于 Pyppeteer how to login on page with type

问题 I was using selenium + chrome driver for my python telegram bot deployed on linux server with docker. Everything is working, but its not supporting async so my app can't do anything else during scrapping. I heard about Pyppeteer, but having some troubles with getting page i need to scrape. Webpage requires me to login. Here are steps: Open page. Click on auth button: <button class="btn btn-outline-warning kt-font-dark mr-2" type="button" id="btn_auth"> <i class="fa fa-key"></i> Enter </button

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host - getting this error

阅读更多关于 ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host - getting this error

问题 ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host I am getting this error while reading a webpage in the following code from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup myurl = 'https://www.amazon.in/s?k=graphics+card&ref=nb_sb_noss_2' uClient =uReq(myurl) 回答1: passing a useragent header seems to solve the issue. try something like this: from urllib.request import urlopen as uReq, Request from bs4 import

How to use excel vba to click on interactive dialog pop-up?

阅读更多关于 How to use excel vba to click on interactive dialog pop-up?

问题 I am trying to use excel vba to navigate and export data from this website. I am able to click on 2018, 2019 buttons and the setting option, but unable to click the 'export data' option with vba. I attach my code below for your reference. Option Explicit Sub GetURLOfFrame() Dim IE As New SHDocVw.InternetExplorer Dim webpage As MSHTML.HTMLDocument IE.Visible = True IE.navigate "https://www.epa.gov/fuels-registration-reporting-and-compliance-help/rin-trades-and-price-information" Dim t As Date,

Excel VBA - Access Website, generate report & press save on IE dialog bar

阅读更多关于 Excel VBA - Access Website, generate report & press save on IE dialog bar

问题 I have a question regarding a topic that is already discussed in some other threads and forums but I do not manage to make it work for me. So I came here to ask that questions concerning my individual code. Basically, I access an intranet-site and based on some input (via checkboxes) a report is created with data from SAP. My problem arises after the report is generated and IE prompts me to press the "save" button on its dialog box. I do not manage to automate that part. Could you help me

Python Webscraping beautifulsoup avoid repetition in find_all()

阅读更多关于 Python Webscraping beautifulsoup avoid repetition in find_all()

问题 I am working on web scraping in Python using beautifulsoup. I am trying to extract text in bold or italics or both. Consider the following HTML snippet. <div> <b> <i> HelloWorld </i> </b> </div> If I use the command sp.find_all(['i', 'b']) , understandably, I get two results, one corresponding to bold and the other to italics. i.e. ['< b>< i>HelloWorld< /i>< /b>', '< i>HelloWorld< /i>'] My question is, is there a way to uniquely extract it and get the tags?. My desired output is something