web-scraping | 易学教程

Scrapping through pages of aspx website

阅读更多关于 Scrapping through pages of aspx website

问题 for the last month or so, I've been trying to read a few pages from an aspx site. I have no problems finding all the required items on the site but my attempted solution is still not working properly. I read somewhere that all the header details must be present, so I added them. I also read somewhere that the __EVENTTARGET must be set to something to tell aspx which button had been pressed, so I tried a few different things(see below). I also read that a session should be established to deal

Get “href=link” from html page and navigate to that link using vba

阅读更多关于 Get “href=link” from html page and navigate to that link using vba

问题 I am writing a code in Excel VBA to get href value of a class and navigate to that href link (i.e) here is the href value I want to get into my particular Excel sheet and I want to navigate to that link automatically through my VBA code. <a href="/questions/51509457/how-to-make-the-word-invisible-when-its-checked-without-js" class="question-hyperlink">How to make the word invisible when it's checked without js</a> The result I'm getting is that I'm able to get that containing tag's class

Is there a way to parse data from multiple pages from a parent webpage?

阅读更多关于 Is there a way to parse data from multiple pages from a parent webpage?

问题 So I have been going to a website to get NDC codes https://ndclist.com/?s=Solifenacin and I need to get 10 digit NDC codes, but on the current webpage there is only 8 digit NDC codes shown like this picture below So I click on the underlined NDC code. And get this webpage. So I copy and paste these 2 NDC codes to an excel sheet, and repeat the process for the rest of the codes on the first webpage I've shown. But this process takes a good bit of time, and was wondering if there was a library

Scraping JSON from AJAX calls

阅读更多关于 Scraping JSON from AJAX calls

问题 Background Considering this url: base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html" I want to make the ajax call for the telephone number: ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729" Wanted results If I press the button through the site in my chrome browser in the console I would get the wanted result : {

How to scrape all the image urls from a Kickstarter webpage?

阅读更多关于 How to scrape all the image urls from a Kickstarter webpage?

问题 I want to scrape all the image urls from this Kickstarter webpage, but the following code does not give all the images: url = 'https://www.kickstarter.com/projects/1878352656/sleep-yoga-go-travel-pillow?ref=category_newest' page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') x = soup.select('img[src^="https://ksr-ugc.imgix.net/assets/"]') print(x) img_links = [] for img in x: img_links.append(img['src']) for l in img_links: print(l) 回答1: import requests from bs4 import

How to extract <div data-v-xxxxxxxx> </div> from HTML using BeautifulSoup?

阅读更多关于 How to extract from HTML using BeautifulSoup?

问题 This website that I'm webscraping has this HTML code: <div data-v-38788375 data-v-07b96579 class="rating score orange">9.3</div> How could I extract the 9.3 value using BeautifulSoup? Here is my code: from bs4 import BeautifulSoup import requests page = requests.get('https://www.hostelworld.com/search?search_keywords=Phuket,%20Thailand&country=Thailand&city=Phuket&date_from=2019-10-14&date_to=2019-10-17&number_of_guests=2') soup = BeautifulSoup(page.text,'lxml') rating = soup.find('div',

div class scraping

阅读更多关于 div class scraping

问题 I am trying to scrape a table from the following web using the code below: library(rvest) library(tidyverse) library(dplyr) base<-'******************' links<-read_html(base)%>%html_nodes(".v-data-table__wrapper") But no luck yet. Can anyone help me with this please? 回答1: There's no table in the page source originally. This page uses JS to generate the table: The idea is to run the JS code to get the data (you will need the V8 package): library(V8) library(rvest) js <- read_html("https://www

How to act when not receiving the data when scrapping with python?

阅读更多关于 How to act when not receiving the data when scrapping with python?

问题 This site has data on stock and I'm trying to sub struct some data from this site. https://quickfs.net/company/AAPL:US Where AAPL is a stock name and can be changed. the page looks like a big table : the columns are years and the rows are calculated values like: Return on Assets and Gross Margin For this I tried to follow few tutorials: Introduction to Web Scraping (Python) - Lesson 02 (Scrape Tables) Intro to Web Scraping with Python and Beautiful Soup Web Scraping HTML Tables with Python

Scrape YouTube Videos with Python and Selenium

阅读更多关于 Scrape YouTube Videos with Python and Selenium

问题 I want to Scrape all Videos from 'TVFilthyFrank' for a friend. I have all links to every video of him. I want to measure the size in MB for all videos now and download them. I know I can just say driver.get(VIDEO_URL) and then just get the src out of the player but that would take very long and wouldt look nice. Is there anyway to get the video src (or at least some information to the vid) out of the video link ? 回答1: You should try youtube-dl : https://github.com/ytdl-org/youtube-dl. It can

How can I scrape the data from in between these span tags?

阅读更多关于 How can I scrape the data from in between these span tags?

问题 I am attempting to scrape the figures shown on https://www.usdebtclock.org/world-debt-clock.html , however due to the numbers constantly changing i am unaware of how to collect this data. This is an example of what i am attempting to do. import requests from bs4 import BeautifulSoup url ="https://www.usdebtclock.org/world-debt-clock.html" URL=requests.get(url) site=BeautifulSoup(URL.text,"html.parser") data=site.find_all("span",id="X4a79R9BW") print(data) The result is this: "[ ]" when i was