beautifulsoup

Need help scraping images from a slideshow with bs4 & python

眉间皱痕 提交于 2021-01-29 10:36:48
问题 I'm trying scrap listing information from Craigslist, unfortunately I can't seem to get the images since they are in a slideshow. import requests from bs4 import BeautifulSoup as soup url = "https://newyork.craigslist.org/search/sss" r = requests.get(url) souped = soup(r.content, 'lxml') Since the images aren't even in the html file requested, do I need to somehow dynamically load the page or something. If so can I keep it only in python, I don't want any other dependencies. Thanks in advance

Get td text with select

。_饼干妹妹 提交于 2021-01-29 10:09:35
问题 I am trying to obtain the odds of the link and I get an error. DO you know what I am doing wrong? Thank you import requests from bs4 import BeautifulSoup as bs url = 'https://www.oddsportal.com/soccer/spain/laliga' r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'}) soup = bs(r.content, 'lxml') ##print([a.text for a in soup.select('#tournamentTable tr[xeid] [href*=soccer]')]) print([b.text for b in soup.select('#tournamentTable td[xodd]')]) I am expecting to obtain 10 rows and 3

Is there a way to parse data from multiple pages from a parent webpage?

爷,独闯天下 提交于 2021-01-29 10:05:31
问题 So I have been going to a website to get NDC codes https://ndclist.com/?s=Solifenacin and I need to get 10 digit NDC codes, but on the current webpage there is only 8 digit NDC codes shown like this picture below So I click on the underlined NDC code. And get this webpage. So I copy and paste these 2 NDC codes to an excel sheet, and repeat the process for the rest of the codes on the first webpage I've shown. But this process takes a good bit of time, and was wondering if there was a library

How to scrape all the image urls from a Kickstarter webpage?

走远了吗. 提交于 2021-01-29 09:47:57
问题 I want to scrape all the image urls from this Kickstarter webpage, but the following code does not give all the images: url = 'https://www.kickstarter.com/projects/1878352656/sleep-yoga-go-travel-pillow?ref=category_newest' page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') x = soup.select('img[src^="https://ksr-ugc.imgix.net/assets/"]') print(x) img_links = [] for img in x: img_links.append(img['src']) for l in img_links: print(l) 回答1: import requests from bs4 import

How to extract <div data-v-xxxxxxxx> </div> from HTML using BeautifulSoup?

随声附和 提交于 2021-01-29 09:42:36
问题 This website that I'm webscraping has this HTML code: <div data-v-38788375 data-v-07b96579 class="rating score orange">9.3</div> How could I extract the 9.3 value using BeautifulSoup? Here is my code: from bs4 import BeautifulSoup import requests page = requests.get('https://www.hostelworld.com/search?search_keywords=Phuket,%20Thailand&country=Thailand&city=Phuket&date_from=2019-10-14&date_to=2019-10-17&number_of_guests=2') soup = BeautifulSoup(page.text,'lxml') rating = soup.find('div',

Is it possible to use bs4 soup object with lxml?

二次信任 提交于 2021-01-29 09:35:44
问题 I am trying to use both BS4 and lxml so instead of parsing html page twice, is there any way to use soup object in lxml or vice versa? self.soup = BeautifulSoup(open(path), "html.parser") i tried using this object with lxml like this doc = html.fromstring(self.soup) this is throwing error TypeError: expected string or bytes-like object is there anyway to get this type of usage ? 回答1: I don't think there is a way without going through a string object. from bs4 import BeautifulSoup import lxml

How to act when not receiving the data when scrapping with python?

走远了吗. 提交于 2021-01-29 09:16:40
问题 This site has data on stock and I'm trying to sub struct some data from this site. https://quickfs.net/company/AAPL:US Where AAPL is a stock name and can be changed. the page looks like a big table : the columns are years and the rows are calculated values like: Return on Assets and Gross Margin For this I tried to follow few tutorials: Introduction to Web Scraping (Python) - Lesson 02 (Scrape Tables) Intro to Web Scraping with Python and Beautiful Soup Web Scraping HTML Tables with Python

Handle o:p tag in BeautifulSoup

折月煮酒 提交于 2021-01-29 09:14:46
问题 I was extracting some disease information from : http://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html but the data was contained inside a tag which I don't know how to handle. One way I found was using find_all function but is there any way to do it as tr.td.span.[o:p or something] ?? <td width="584" nowrap="" valign="top" style="width:438.0pt;padding:0in 5.4pt 0in 5.4pt; height:12.75pt"> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans

Scrape Google Search Result Description Using BeautifulSoup

放肆的年华 提交于 2021-01-29 08:27:53
问题 I want to Scrape Google Search Result Description Using BeautifulSoup but I am not able to scrape the tag which is containing the description. Ancestor: html body#gsr.srp.vasq.wf-b div#main div#cnt.big div.mw div#rcnt div.col div#center_col div#res.med div#search div div#rso div.g div.rc div.IsZvec div span.aCOpRe Children em Python Code: from bs4 import BeautifulSoup import requests import bs4.builder._lxml import re search = input("Enter the search term:") param = {"q": search} r = requests

Scraping text from unordered lists using beautiful soup and python

允我心安 提交于 2021-01-29 07:22:42
问题 I am using python and beautiful soup to scrape information from a web page. I am interested in the following section of source code: <ul class="breadcrumb"> <li><a href="/" title="Return to the home page">Home</a><span class="sprite icon-delimiter"></span></li> <li><a href="/VehicleSearch/Search/Mini" title="View our range of Mini vehicles">Mini</a><span class="sprite icon-delimiter"></span></li> <li class="active"><a href="/VehicleSearch/Search/Mini/Countryman" title="View our range of Mini