web-scraping | 易学教程

Clicking on hyperlink with partial href on Internet Explorer using vba

阅读更多关于 Clicking on hyperlink with partial href on Internet Explorer using vba

问题 Hi I am trying to create a script to click on a link of which I can provide a partial link. It would be great if someone may please advise how I can do this <a href="website/report/download.json?refId=3e49762e-8edc-47c2-a282-11ee3c64e85a&reportType=xlsx&fileName=GeneralExtract.xlsx&obo>GeneralExtract.xlsx</a> Set i = CreateObject("InternetExplorer.Application") Dim idoc As MSHTML.HTMLDocument Set idoc = i.document Set eles6 = idoc.getElementsByTagName("a") For Each ele6 In eles6 If ele6.href

Iterate and extract tables from web saving as excel file in Python

阅读更多关于 Iterate and extract tables from web saving as excel file in Python

问题 I want to iterate and extract table from the link here, then save as excel file. How can I do that? Thank you. My code so far: import pandas as pd import requests from bs4 import BeautifulSoup from tabulate import tabulate url = 'http://zjj.sz.gov.cn/ztfw/gcjs/xmxx/jgysba/' res = requests.get(url) soup = BeautifulSoup(res.content,'lxml') print(soup) New update: from requests import post import json import pandas as pd import numpy as np headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0;

Scraping specific data inside a table II

阅读更多关于 Scraping specific data inside a table II

问题 I hate that I have to ask this question again but the website I had been scraping data from updated, not just aesthetically, the underlying code has changed too. Before the update, the program would find the "Key Data" table and use a counter to find specific data. The problem is I'm not getting into the values anymore and when I try to use a Class Name closer to the value, it doesn't find it at all and drops out of the program. I've cut out some of the code below to share, would appreciate

How to handle lazy-loaded images in selenium?

阅读更多关于 How to handle lazy-loaded images in selenium?

问题 Before marking as duplicate, please consider that I have already looked through many related stack overflow posts, as well as websites and articles. I have not found a solution yet. This question is a follow up to this question here Selenium Webdriver not finding XPATH despite seemingly identical strings. I determined the problem did not in fact come from the xpath method by updating the code to work in a more elegant manner: for item in feed: img_div = item.find_element_by_class_name(

WPF Can't retrieve WebP image from url?

阅读更多关于 WPF Can't retrieve WebP image from url?

问题 I'm unable to retrieve an image from a url. Previously I was unable to connect to the site at all until I set HttpClient headers. I'm able to retrieve images from other sources but not this particular one. Code for retrieving image: var img = new BitmapImage(); img.BeginInit(); img.UriSource = new Uri("https://i1.adis.ws/i/jpl/jd_083285_a?qlt=80&w=600&h=425&v=1&fmt=webp", UriKind.RelativeOrAbsolute); img.EndInit(); Console.Out.WriteLine(); ImageShoe.Source = img; If I try to retrieve a

Getting javascript variable value while scraping with python

阅读更多关于 Getting javascript variable value while scraping with python

问题 I know this is asked before also, but I am a newbie in scraping and python. Please help me and it would be very much helpful in my learning path. I am scraping a news site using python with packages such as Beautiful Soup and etc. I am facing difficulty while getting the value of java script variable which is declared in script tag and also it is getting updated there. Here is the part of HTML page which I am scraping:(containing only script part) <!-- Eliminate render-blocking JavaScript and

Getting javascript variable value while scraping with python

阅读更多关于 Getting javascript variable value while scraping with python

System.Windows.Forms.WebBrowser wait until page has been fully loaded

阅读更多关于 System.Windows.Forms.WebBrowser wait until page has been fully loaded

问题 I have been trying a lot of different solutions with wait and async . Nothing seems to work. I was not able to find solution that actually fully waits until page has been fully loaded. All codes are waiting some time but not until page has been loaded and I am getting an error on next process. How I can set for example code into wait mode until Document.GetElementById("quickFind_text_0") element has been found on page? Here is my code: private void button7_Click(object sender, EventArgs e) {

<tbody> tag displays in chrome but not source

阅读更多关于 tag displays in chrome but not source

问题 In doing some scraping work I keep encountering the <tbody> tag in the Chrome DevTools inspector, but it doesn't appear in the source. For what I hope are obvious reasons, I find this super confusing. What's going on here? (I should also add that the html on this page is pretty malformed). For example, DevTools shows: <table> <tbody> <tr valign="top"> <td>...</td> Page source shows: <table border="0"> <tr valign="top"> <td> 回答1: The start tag for <tbody> is optional. That is, you can leave it

How can I access this type of site using requests? [duplicate]

阅读更多关于 How can I access this type of site using requests? [duplicate]

问题 This question already has answers here : Scraper in Python gives “Access Denied” (3 answers) Closed 8 months ago . This is the first time I've encountered a site where it wouldn't 'allow me access' to the webpage. I'm not sure why and I can't figure out how to scrape from this website. My attempt: import requests from bs4 import BeautifulSoup def html(url): return BeautifulSoup(requests.get(url).content, "lxml") url = "https://www.g2a.com/" soup = html(url) print(soup.prettify()) Output: