beautifulsoup | 易学教程

Need help scraping images from a slideshow with bs4 & python

阅读更多关于 Need help scraping images from a slideshow with bs4 & python

问题 I'm trying scrap listing information from Craigslist, unfortunately I can't seem to get the images since they are in a slideshow. import requests from bs4 import BeautifulSoup as soup url = "https://newyork.craigslist.org/search/sss" r = requests.get(url) souped = soup(r.content, 'lxml') Since the images aren't even in the html file requested, do I need to somehow dynamically load the page or something. If so can I keep it only in python, I don't want any other dependencies. Thanks in advance

Get td text with select

阅读更多关于 Get td text with select

问题 I am trying to obtain the odds of the link and I get an error. DO you know what I am doing wrong? Thank you import requests from bs4 import BeautifulSoup as bs url = 'https://www.oddsportal.com/soccer/spain/laliga' r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'}) soup = bs(r.content, 'lxml') ##print([a.text for a in soup.select('#tournamentTable tr[xeid] [href*=soccer]')]) print([b.text for b in soup.select('#tournamentTable td[xodd]')]) I am expecting to obtain 10 rows and 3

Is there a way to parse data from multiple pages from a parent webpage?

阅读更多关于 Is there a way to parse data from multiple pages from a parent webpage?

问题 So I have been going to a website to get NDC codes https://ndclist.com/?s=Solifenacin and I need to get 10 digit NDC codes, but on the current webpage there is only 8 digit NDC codes shown like this picture below So I click on the underlined NDC code. And get this webpage. So I copy and paste these 2 NDC codes to an excel sheet, and repeat the process for the rest of the codes on the first webpage I've shown. But this process takes a good bit of time, and was wondering if there was a library

How to scrape all the image urls from a Kickstarter webpage?

阅读更多关于 How to scrape all the image urls from a Kickstarter webpage?

问题 I want to scrape all the image urls from this Kickstarter webpage, but the following code does not give all the images: url = 'https://www.kickstarter.com/projects/1878352656/sleep-yoga-go-travel-pillow?ref=category_newest' page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') x = soup.select('img[src^="https://ksr-ugc.imgix.net/assets/"]') print(x) img_links = [] for img in x: img_links.append(img['src']) for l in img_links: print(l) 回答1: import requests from bs4 import

How to extract <div data-v-xxxxxxxx> </div> from HTML using BeautifulSoup?

阅读更多关于 How to extract from HTML using BeautifulSoup?

问题 This website that I'm webscraping has this HTML code: <div data-v-38788375 data-v-07b96579 class="rating score orange">9.3</div> How could I extract the 9.3 value using BeautifulSoup? Here is my code: from bs4 import BeautifulSoup import requests page = requests.get('https://www.hostelworld.com/search?search_keywords=Phuket,%20Thailand&country=Thailand&city=Phuket&date_from=2019-10-14&date_to=2019-10-17&number_of_guests=2') soup = BeautifulSoup(page.text,'lxml') rating = soup.find('div',

Is it possible to use bs4 soup object with lxml?

阅读更多关于 Is it possible to use bs4 soup object with lxml?

问题 I am trying to use both BS4 and lxml so instead of parsing html page twice, is there any way to use soup object in lxml or vice versa? self.soup = BeautifulSoup(open(path), "html.parser") i tried using this object with lxml like this doc = html.fromstring(self.soup) this is throwing error TypeError: expected string or bytes-like object is there anyway to get this type of usage ? 回答1: I don't think there is a way without going through a string object. from bs4 import BeautifulSoup import lxml

How to act when not receiving the data when scrapping with python?

阅读更多关于 How to act when not receiving the data when scrapping with python?

问题 This site has data on stock and I'm trying to sub struct some data from this site. https://quickfs.net/company/AAPL:US Where AAPL is a stock name and can be changed. the page looks like a big table : the columns are years and the rows are calculated values like: Return on Assets and Gross Margin For this I tried to follow few tutorials: Introduction to Web Scraping (Python) - Lesson 02 (Scrape Tables) Intro to Web Scraping with Python and Beautiful Soup Web Scraping HTML Tables with Python

Handle o:p tag in BeautifulSoup

阅读更多关于 Handle o:p tag in BeautifulSoup

问题 I was extracting some disease information from : http://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html but the data was contained inside a tag which I don't know how to handle. One way I found was using find_all function but is there any way to do it as tr.td.span.[o:p or something] ?? <td width="584" nowrap="" valign="top" style="width:438.0pt;padding:0in 5.4pt 0in 5.4pt; height:12.75pt"> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans

Scrape Google Search Result Description Using BeautifulSoup

阅读更多关于 Scrape Google Search Result Description Using BeautifulSoup

问题 I want to Scrape Google Search Result Description Using BeautifulSoup but I am not able to scrape the tag which is containing the description. Ancestor: html body#gsr.srp.vasq.wf-b div#main div#cnt.big div.mw div#rcnt div.col div#center_col div#res.med div#search div div#rso div.g div.rc div.IsZvec div span.aCOpRe Children em Python Code: from bs4 import BeautifulSoup import requests import bs4.builder._lxml import re search = input("Enter the search term:") param = {"q": search} r = requests

Scraping text from unordered lists using beautiful soup and python

阅读更多关于 Scraping text from unordered lists using beautiful soup and python

问题 I am using python and beautiful soup to scrape information from a web page. I am interested in the following section of source code: <ul class="breadcrumb"> <li><a href="/" title="Return to the home page">Home</a><span class="sprite icon-delimiter"></span></li> <li><a href="/VehicleSearch/Search/Mini" title="View our range of Mini vehicles">Mini</a><span class="sprite icon-delimiter"></span></li> <li class="active"><a href="/VehicleSearch/Search/Mini/Countryman" title="View our range of Mini