beautifulsoup | 易学教程

Python + BeautifulSoup: How to get wrapper out of HTML based on text?

阅读更多关于 Python + BeautifulSoup: How to get wrapper out of HTML based on text?

问题 Would like to get the wrapper of a key text. For example, in HTML: … <div class=“target”>chicken</div> <div class=“not-target”>apple</div> … And by based on the text “chicken”, would like to get back <div class=“target”>chicken</div> . Currently, have the following to fetch the HTML: import requests from bs4 import BeautifulSoup req = requests.get(url).txt soup = BeautifulSoup(r, ‘html.parser’) And having to just do soup.find_all(‘div’,…) and loop through all available div to find the wrapper

BeautifulSoup Find within an instagram html page

阅读更多关于 BeautifulSoup Find within an instagram html page

问题 I have a problem to find something with bs4. I'm trying to automatically find some urls in an html instagram page and (knowing that I'm a python noob) I can't find the way to search automatically within the html source code the urls who are in the exemple after the "display_url": http..." . I want to make my script search multiples url who appears as next as "display_url" and download them. They have to be extracted as many times as they appear in the source code. With bs4 I tried the : f =

Beautiful Soup Find - get just the text

阅读更多关于 Beautiful Soup Find - get just the text

问题 I had this bit of code spitting out just the price as a string (125.01), but I must have changed something because now it prints the whole line with the html tags and everything. How can i get it to print out just the text, without using regular expressions? import requests from bs4 import BeautifulSoup url = 'http://finance.yahoo.com/q?s=aapl&fr=uh3_finance_web&uhb=uhb2' data = requests.get(url) soup = BeautifulSoup(data.content) price = soup.find("span", {'id':'yfs_l84_aapl'}) print(price)

BeautifulSoup how to extract text after <br> tag

阅读更多关于 BeautifulSoup how to extract text after tag

问题 I don't know how to reach the following paragraph using BeautifulSoup and how to extract the particular text that I want. As I am new to Python and BS4. My HTML is following: <div class="inner-content"> <div class="bred"></div> <div class="clrbth"></div> <h1></h1> <h4></h4> ... ... ... <p></p> <p></p> <p>  Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when

Parse all item elements with children from RSS feed with beautifulsoup

阅读更多关于 Parse all item elements with children from RSS feed with beautifulsoup

问题 From an RSS feed, how do you get a string of everything that's inside each item tag? Example input (simplified): <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Test</title> <item> <title>Hello world1</title> <comments>Hi there</comments> <pubDate>Tue, 21 Nov 2011 20:10:10 +0000</pubDate> </item> <item> <title>Hello world2</title> <comments>Good afternoon</comments> <pubDate>Tue, 22 Nov 2011 20:10:10 +0000</pubDate> </item> <item> <title>Hello world3</title>

Parse all item elements with children from RSS feed with beautifulsoup

阅读更多关于 Parse all item elements with children from RSS feed with beautifulsoup

Can I use requests.post to submit a form?

阅读更多关于 Can I use requests.post to submit a form?

问题 I am trying to get the list of stores from this site: http://www.health.state.mn.us/divs/cfh/wic/wicstores/ I'd like to get the list of stores that is produced when you click on the button "View All Stores". I understand that I could use Selenium or MechanicalSoup or ... to do this but I was hoping to use requests. It looks like clicking on the button submits a form: <form name="setAllStores" id="setAllStores" action="/divs/cfh/wic/wicstores/index.cfm" method="post" onsubmit="return _CF

Issue with html tags while scraping data using beautiful soup

阅读更多关于 Issue with html tags while scraping data using beautiful soup

问题 Common piece of code: # -*- coding: cp1252 -*- import csv import urllib2 import sys import time from bs4 import BeautifulSoup from itertools import islice page = urllib2.urlopen('http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html').read() soup = BeautifulSoup(page) prices = soup.findAll('div', {"class": "price"}) After this I am trying following codes to get data: Code 1: for price in prices: print unicode(price.string).encode('utf8') Output1: No Output, code runs without any

Scraping data from the tag names in python

阅读更多关于 Scraping data from the tag names in python

问题 Hi I am trying to scrape user data from a website. I need User ID which are available in the tag names itself.I am trying to scrape the UID using python selenium and beautiful soup in the div tag. Example: <"div id="UID_**60CE07D6DF5C02A987ED7B076F4154F3**-SRC_328619641" class="memberOverlayLink" onmouseover="ta.trackEventOnPage('Reviews','show_reviewer_info_window','user_name_photo'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', 0, (new

How can I download full webpage by a Python program?

阅读更多关于 How can I download full webpage by a Python program?

问题 Currently I have a program that can only download the HTML of a given page. Now I want a program that can download all the files of the web page including HTML, CSS, JS and image files(Same as we get on ctrl-s of any website). My current program is: import urllib urllib.urlretrieve ("https://en.wikipedia.org/wiki/Python_%28programming_language%29", "t3.html") I have visited many such questions in Stack Overflow, but they are all only downloading the HTML file. 回答1: The following