web-scraping

Scrapping through pages of aspx website

好久不见. 提交于 2021-01-29 10:30:34
问题 for the last month or so, I've been trying to read a few pages from an aspx site. I have no problems finding all the required items on the site but my attempted solution is still not working properly. I read somewhere that all the header details must be present, so I added them. I also read somewhere that the __EVENTTARGET must be set to something to tell aspx which button had been pressed, so I tried a few different things(see below). I also read that a session should be established to deal

Get “href=link” from html page and navigate to that link using vba

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-29 10:28:21
问题 I am writing a code in Excel VBA to get href value of a class and navigate to that href link (i.e) here is the href value I want to get into my particular Excel sheet and I want to navigate to that link automatically through my VBA code. <a href="/questions/51509457/how-to-make-the-word-invisible-when-its-checked-without-js" class="question-hyperlink">How to make the word invisible when it's checked without js</a> The result I'm getting is that I'm able to get that containing tag's class

Is there a way to parse data from multiple pages from a parent webpage?

爷,独闯天下 提交于 2021-01-29 10:05:31
问题 So I have been going to a website to get NDC codes https://ndclist.com/?s=Solifenacin and I need to get 10 digit NDC codes, but on the current webpage there is only 8 digit NDC codes shown like this picture below So I click on the underlined NDC code. And get this webpage. So I copy and paste these 2 NDC codes to an excel sheet, and repeat the process for the rest of the codes on the first webpage I've shown. But this process takes a good bit of time, and was wondering if there was a library

Scraping JSON from AJAX calls

你。 提交于 2021-01-29 10:00:49
问题 Background Considering this url: base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html" I want to make the ajax call for the telephone number: ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729" Wanted results If I press the button through the site in my chrome browser in the console I would get the wanted result : {

How to scrape all the image urls from a Kickstarter webpage?

走远了吗. 提交于 2021-01-29 09:47:57
问题 I want to scrape all the image urls from this Kickstarter webpage, but the following code does not give all the images: url = 'https://www.kickstarter.com/projects/1878352656/sleep-yoga-go-travel-pillow?ref=category_newest' page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') x = soup.select('img[src^="https://ksr-ugc.imgix.net/assets/"]') print(x) img_links = [] for img in x: img_links.append(img['src']) for l in img_links: print(l) 回答1: import requests from bs4 import

How to extract <div data-v-xxxxxxxx> </div> from HTML using BeautifulSoup?

随声附和 提交于 2021-01-29 09:42:36
问题 This website that I'm webscraping has this HTML code: <div data-v-38788375 data-v-07b96579 class="rating score orange">9.3</div> How could I extract the 9.3 value using BeautifulSoup? Here is my code: from bs4 import BeautifulSoup import requests page = requests.get('https://www.hostelworld.com/search?search_keywords=Phuket,%20Thailand&country=Thailand&city=Phuket&date_from=2019-10-14&date_to=2019-10-17&number_of_guests=2') soup = BeautifulSoup(page.text,'lxml') rating = soup.find('div',

div class scraping

末鹿安然 提交于 2021-01-29 09:23:54
问题 I am trying to scrape a table from the following web using the code below: library(rvest) library(tidyverse) library(dplyr) base<-'******************' links<-read_html(base)%>%html_nodes(".v-data-table__wrapper") But no luck yet. Can anyone help me with this please? 回答1: There's no table in the page source originally. This page uses JS to generate the table: The idea is to run the JS code to get the data (you will need the V8 package): library(V8) library(rvest) js <- read_html("https://www

How to act when not receiving the data when scrapping with python?

走远了吗. 提交于 2021-01-29 09:16:40
问题 This site has data on stock and I'm trying to sub struct some data from this site. https://quickfs.net/company/AAPL:US Where AAPL is a stock name and can be changed. the page looks like a big table : the columns are years and the rows are calculated values like: Return on Assets and Gross Margin For this I tried to follow few tutorials: Introduction to Web Scraping (Python) - Lesson 02 (Scrape Tables) Intro to Web Scraping with Python and Beautiful Soup Web Scraping HTML Tables with Python

Scrape YouTube Videos with Python and Selenium

牧云@^-^@ 提交于 2021-01-29 09:13:05
问题 I want to Scrape all Videos from 'TVFilthyFrank' for a friend. I have all links to every video of him. I want to measure the size in MB for all videos now and download them. I know I can just say driver.get(VIDEO_URL) and then just get the src out of the player but that would take very long and wouldt look nice. Is there anyway to get the video src (or at least some information to the vid) out of the video link ? 回答1: You should try youtube-dl : https://github.com/ytdl-org/youtube-dl. It can

How can I scrape the data from in between these span tags?

我怕爱的太早我们不能终老 提交于 2021-01-29 09:02:29
问题 I am attempting to scrape the figures shown on https://www.usdebtclock.org/world-debt-clock.html , however due to the numbers constantly changing i am unaware of how to collect this data. This is an example of what i am attempting to do. import requests from bs4 import BeautifulSoup url ="https://www.usdebtclock.org/world-debt-clock.html" URL=requests.get(url) site=BeautifulSoup(URL.text,"html.parser") data=site.find_all("span",id="X4a79R9BW") print(data) The result is this: "[ ]" when i was