web-scraping

Clicking on hyperlink with partial href on Internet Explorer using vba

喜你入骨 提交于 2021-02-05 11:42:08
问题 Hi I am trying to create a script to click on a link of which I can provide a partial link. It would be great if someone may please advise how I can do this <a href="website/report/download.json?refId=3e49762e-8edc-47c2-a282-11ee3c64e85a&reportType=xlsx&fileName=GeneralExtract.xlsx&obo>GeneralExtract.xlsx</a> Set i = CreateObject("InternetExplorer.Application") Dim idoc As MSHTML.HTMLDocument Set idoc = i.document Set eles6 = idoc.getElementsByTagName("a") For Each ele6 In eles6 If ele6.href

Iterate and extract tables from web saving as excel file in Python

╄→尐↘猪︶ㄣ 提交于 2021-02-05 11:30:12
问题 I want to iterate and extract table from the link here, then save as excel file. How can I do that? Thank you. My code so far: import pandas as pd import requests from bs4 import BeautifulSoup from tabulate import tabulate url = 'http://zjj.sz.gov.cn/ztfw/gcjs/xmxx/jgysba/' res = requests.get(url) soup = BeautifulSoup(res.content,'lxml') print(soup) New update: from requests import post import json import pandas as pd import numpy as np headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0;

Scraping specific data inside a table II

十年热恋 提交于 2021-02-05 11:12:12
问题 I hate that I have to ask this question again but the website I had been scraping data from updated, not just aesthetically, the underlying code has changed too. Before the update, the program would find the "Key Data" table and use a counter to find specific data. The problem is I'm not getting into the values anymore and when I try to use a Class Name closer to the value, it doesn't find it at all and drops out of the program. I've cut out some of the code below to share, would appreciate

How to handle lazy-loaded images in selenium?

爱⌒轻易说出口 提交于 2021-02-05 10:46:11
问题 Before marking as duplicate, please consider that I have already looked through many related stack overflow posts, as well as websites and articles. I have not found a solution yet. This question is a follow up to this question here Selenium Webdriver not finding XPATH despite seemingly identical strings. I determined the problem did not in fact come from the xpath method by updating the code to work in a more elegant manner: for item in feed: img_div = item.find_element_by_class_name(

WPF Can't retrieve WebP image from url?

谁说胖子不能爱 提交于 2021-02-05 09:31:13
问题 I'm unable to retrieve an image from a url. Previously I was unable to connect to the site at all until I set HttpClient headers. I'm able to retrieve images from other sources but not this particular one. Code for retrieving image: var img = new BitmapImage(); img.BeginInit(); img.UriSource = new Uri("https://i1.adis.ws/i/jpl/jd_083285_a?qlt=80&w=600&h=425&v=1&fmt=webp", UriKind.RelativeOrAbsolute); img.EndInit(); Console.Out.WriteLine(); ImageShoe.Source = img; If I try to retrieve a

Getting javascript variable value while scraping with python

。_饼干妹妹 提交于 2021-02-05 08:19:07
问题 I know this is asked before also, but I am a newbie in scraping and python. Please help me and it would be very much helpful in my learning path. I am scraping a news site using python with packages such as Beautiful Soup and etc. I am facing difficulty while getting the value of java script variable which is declared in script tag and also it is getting updated there. Here is the part of HTML page which I am scraping:(containing only script part) <!-- Eliminate render-blocking JavaScript and

Getting javascript variable value while scraping with python

旧城冷巷雨未停 提交于 2021-02-05 08:18:52
问题 I know this is asked before also, but I am a newbie in scraping and python. Please help me and it would be very much helpful in my learning path. I am scraping a news site using python with packages such as Beautiful Soup and etc. I am facing difficulty while getting the value of java script variable which is declared in script tag and also it is getting updated there. Here is the part of HTML page which I am scraping:(containing only script part) <!-- Eliminate render-blocking JavaScript and

System.Windows.Forms.WebBrowser wait until page has been fully loaded

空扰寡人 提交于 2021-02-05 08:13:28
问题 I have been trying a lot of different solutions with wait and async . Nothing seems to work. I was not able to find solution that actually fully waits until page has been fully loaded. All codes are waiting some time but not until page has been loaded and I am getting an error on next process. How I can set for example code into wait mode until Document.GetElementById("quickFind_text_0") element has been found on page? Here is my code: private void button7_Click(object sender, EventArgs e) {

<tbody> tag displays in chrome but not source

别等时光非礼了梦想. 提交于 2021-02-05 08:09:44
问题 In doing some scraping work I keep encountering the <tbody> tag in the Chrome DevTools inspector, but it doesn't appear in the source. For what I hope are obvious reasons, I find this super confusing. What's going on here? (I should also add that the html on this page is pretty malformed). For example, DevTools shows: <table> <tbody> <tr valign="top"> <td>...</td> Page source shows: <table border="0"> <tr valign="top"> <td> 回答1: The start tag for <tbody> is optional. That is, you can leave it

How can I access this type of site using requests? [duplicate]

落花浮王杯 提交于 2021-02-05 08:09:38
问题 This question already has answers here : Scraper in Python gives “Access Denied” (3 answers) Closed 8 months ago . This is the first time I've encountered a site where it wouldn't 'allow me access' to the webpage. I'm not sure why and I can't figure out how to scrape from this website. My attempt: import requests from bs4 import BeautifulSoup def html(url): return BeautifulSoup(requests.get(url).content, "lxml") url = "https://www.g2a.com/" soup = html(url) print(soup.prettify()) Output: