beautifulsoup

Python + BeautifulSoup: How to get wrapper out of HTML based on text?

為{幸葍}努か 提交于 2019-12-20 06:18:26
问题 Would like to get the wrapper of a key text. For example, in HTML: … <div class=“target”>chicken</div> <div class=“not-target”>apple</div> … And by based on the text “chicken”, would like to get back <div class=“target”>chicken</div> . Currently, have the following to fetch the HTML: import requests from bs4 import BeautifulSoup req = requests.get(url).txt soup = BeautifulSoup(r, ‘html.parser’) And having to just do soup.find_all(‘div’,…) and loop through all available div to find the wrapper

BeautifulSoup Find within an instagram html page

倾然丶 夕夏残阳落幕 提交于 2019-12-20 05:12:18
问题 I have a problem to find something with bs4. I'm trying to automatically find some urls in an html instagram page and (knowing that I'm a python noob) I can't find the way to search automatically within the html source code the urls who are in the exemple after the "display_url": http..." . I want to make my script search multiples url who appears as next as "display_url" and download them. They have to be extracted as many times as they appear in the source code. With bs4 I tried the : f =

Beautiful Soup Find - get just the text

Deadly 提交于 2019-12-20 04:38:37
问题 I had this bit of code spitting out just the price as a string (125.01), but I must have changed something because now it prints the whole line with the html tags and everything. How can i get it to print out just the text, without using regular expressions? import requests from bs4 import BeautifulSoup url = 'http://finance.yahoo.com/q?s=aapl&fr=uh3_finance_web&uhb=uhb2' data = requests.get(url) soup = BeautifulSoup(data.content) price = soup.find("span", {'id':'yfs_l84_aapl'}) print(price)

BeautifulSoup how to extract text after <br> tag

99封情书 提交于 2019-12-20 04:38:13
问题 I don't know how to reach the following paragraph using BeautifulSoup and how to extract the particular text that I want. As I am new to Python and BS4. My HTML is following: <div class="inner-content"> <div class="bred"></div> <div class="clrbth"></div> <h1></h1> <h4></h4> ... ... ... <p></p> <p></p> <p> <!--This text I don't want --> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when

Parse all item elements with children from RSS feed with beautifulsoup

試著忘記壹切 提交于 2019-12-20 04:37:14
问题 From an RSS feed, how do you get a string of everything that's inside each item tag? Example input (simplified): <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Test</title> <item> <title>Hello world1</title> <comments>Hi there</comments> <pubDate>Tue, 21 Nov 2011 20:10:10 +0000</pubDate> </item> <item> <title>Hello world2</title> <comments>Good afternoon</comments> <pubDate>Tue, 22 Nov 2011 20:10:10 +0000</pubDate> </item> <item> <title>Hello world3</title>

Parse all item elements with children from RSS feed with beautifulsoup

百般思念 提交于 2019-12-20 04:37:01
问题 From an RSS feed, how do you get a string of everything that's inside each item tag? Example input (simplified): <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Test</title> <item> <title>Hello world1</title> <comments>Hi there</comments> <pubDate>Tue, 21 Nov 2011 20:10:10 +0000</pubDate> </item> <item> <title>Hello world2</title> <comments>Good afternoon</comments> <pubDate>Tue, 22 Nov 2011 20:10:10 +0000</pubDate> </item> <item> <title>Hello world3</title>

Can I use requests.post to submit a form?

梦想与她 提交于 2019-12-20 04:34:18
问题 I am trying to get the list of stores from this site: http://www.health.state.mn.us/divs/cfh/wic/wicstores/ I'd like to get the list of stores that is produced when you click on the button "View All Stores". I understand that I could use Selenium or MechanicalSoup or ... to do this but I was hoping to use requests. It looks like clicking on the button submits a form: <form name="setAllStores" id="setAllStores" action="/divs/cfh/wic/wicstores/index.cfm" method="post" onsubmit="return _CF

Issue with html tags while scraping data using beautiful soup

拟墨画扇 提交于 2019-12-20 04:28:15
问题 Common piece of code: # -*- coding: cp1252 -*- import csv import urllib2 import sys import time from bs4 import BeautifulSoup from itertools import islice page = urllib2.urlopen('http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html').read() soup = BeautifulSoup(page) prices = soup.findAll('div', {"class": "price"}) After this I am trying following codes to get data: Code 1: for price in prices: print unicode(price.string).encode('utf8') Output1: No Output, code runs without any

Scraping data from the tag names in python

孤街浪徒 提交于 2019-12-20 04:15:55
问题 Hi I am trying to scrape user data from a website. I need User ID which are available in the tag names itself.I am trying to scrape the UID using python selenium and beautiful soup in the div tag. Example: <"div id="UID_**60CE07D6DF5C02A987ED7B076F4154F3**-SRC_328619641" class="memberOverlayLink" onmouseover="ta.trackEventOnPage('Reviews','show_reviewer_info_window','user_name_photo'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', 0, (new

How can I download full webpage by a Python program?

家住魔仙堡 提交于 2019-12-20 03:56:14
问题 Currently I have a program that can only download the HTML of a given page. Now I want a program that can download all the files of the web page including HTML, CSS, JS and image files(Same as we get on ctrl-s of any website). My current program is: import urllib urllib.urlretrieve ("https://en.wikipedia.org/wiki/Python_%28programming_language%29", "t3.html") I have visited many such questions in Stack Overflow, but they are all only downloading the HTML file. 回答1: The following