beautifulsoup

BeautifulSoup: Replace anchor text with text from another tag

回眸只為那壹抹淺笑 提交于 2019-12-25 01:55:44
问题 I'm trying to extract all links on a page and so far I'm able to get the links but the anchor text in the link doesn't provide any relevant information. That information is contained in another sibling tag. This is the Html Layout: <tbody> <tr> <td> <h3>Driver with license E or F</h3> <div class = "date">..</div> <br> <p>...</p> <div id='print'> <a href="show_classifieds?..." class="bar">Go To Details</a> </div> <br> </td> </tr> <tr> <td> <h3>Payroll Administrator</h3> <div class = "date">..<

How to get contents of nested tag using BeautifulSoup

半城伤御伤魂 提交于 2019-12-25 01:53:49
问题 How can I use BeautifullSoup to get to the number before the closing span tag? <span class="count"> <i class="icon-user"></i> 30.5K </span> I can use: usercount=soup.findAll('span',{'class':'count'}) but not: usercount=soup.findAll('i',{'class':'count'}) 回答1: The text you're after is the text node after the <i> in the <span> : import bs4 soup = bs4.BeautifulSoup(''' <span class="count"> <i class="icon-user"></i> 30.5K </span> ''') usercount = soup.find('span', class_='count').find('i').next

Automate the boring stuff with python - google search via beautiful soup 4

蓝咒 提交于 2019-12-25 01:48:56
问题 Has anyone had trouble calling browser after executing the code mentioned below? I have executed code from Pycharm version 2019.1 and I have done it from CMD but no results. My Code: import requests import sys import webbrowser as wb import bs4 print ('Googling...') res = requests.get('http://google.com/search?q='+''.join(sys.argv[1:])) res.raise_for_status() soup = bs4.BeautifulSoup(res.text, 'lxml') linkElements = soup.select('.r a') linkToOpen = min(5, len(linkElements)) for i in range

How to pull links from within an 'a' tag

馋奶兔 提交于 2019-12-25 01:45:59
问题 I have attempted several methods to pull links from the following webpage, but can't seem to find the desired links. From this webpage (https://www.espn.com/collegefootball/scoreboard//year/2019/seasontype/2/week/1) I am attempting to extract all of the links for the "gamecast" button. The example of the first one I would be attempting to get is this: https://www.espn.com/college-football/game//gameId/401110723 When I try to just pull all links on the page I do not even seem to get the

Scraping Instagram with BeautifulSoup

你离开我真会死。 提交于 2019-12-25 01:34:42
问题 I'm trying to get a particular string from the "search by tag" in Instagram. I'd like to get the url img from here: <img alt="#yeşil #manzara #doğa #yayla #nature #naturelovers #adventuretime #adventures #mountainstaries #picture #şehirdenuzak #tatil #holiday #cow #potography #view #kütükev #naturelife #animal #amazing #kar #winter #winteriscomming #mapavr1 #artvin #tulumile #insaatr #tulumci #rize class="_2di5p" sizes="171px" srcset="https://scontent-mxp11.cdninstagram.com/vp

What are these errors and how do I handle them?

那年仲夏 提交于 2019-12-25 01:26:10
问题 I am using this simple code for l in bios: OpenThisLink = url + l response = urllib2.urlopen(OpenThisLink) to open about 200 urls and search them with regex (and BeautifulSoup), but after a dozen or so I get these errors and IDLE quits. What do they mean? How can I handle them? Thank you. Traceback (most recent call last): File "\PROJECTS\JD\jd10.py", line 15, in <module> response = urllib2.urlopen(OpenThisLink) File "C:\Python26\lib\urllib2.py", line 124, in urlopen return _opener.open(url,

(Web scraping) I've located the proper tags, now how do I extract the text?

被刻印的时光 ゝ 提交于 2019-12-25 01:14:36
问题 I'm creating my first web scraping application that collects the titles of games currently on the "new and trending" tab on https://store.steampowered.com/. Once I figure out how to do this, I want to repeat the process with prices, and export both to a spreadsheet in separate columns. I've successfully found the tags that contain the text I'm trying to extract (the title), but I'm unsure how to extract the titles once I've located their containers. from urllib.request import urlopen from bs4

Confusion to read html table contents using BeautifulSoup?

我与影子孤独终老i 提交于 2019-12-25 01:04:12
问题 here is the HTML content: <table cellspacing="1" cellpadding="0" class="data"> <tr class="colhead"> <th colspan="3">Expression</th> </tr> <tr class="colhead"> <th>Task</th> <th>Action</th> <th>List</th> </tr> <tr class="rowLight"> <td width="40%"> Task1 </td> <td width="20%"> Assigned to </td> <td width="40%"> Harry </td> </tr> <tr class="rowDark"> <td width="40%"> Task2 </td> <td width="20%"> Rejected by </td> <td width="40%"> Lopa </td> </tr> <tr class="rowLight"> <td width="40%"> Task5 <

Python: How to access and iterate over a list of div class element using (BeautifulSoup)

耗尽温柔 提交于 2019-12-25 00:25:41
问题 I'm parsing data about car production with BeautifulSoup (see also my first question): from bs4 import BeautifulSoup import string html = """ <h4>Production Capacity (year)</h4> <div class="profile-area"> Vehicle 1,140,000 units /year </div> <h4>Output</h4> <div class="profile-area"> Vehicle 809,000 units ( 2016 ) </div> <div class="profile-area"> Vehicle 815,000 units ( 2015 ) </div> <div class="profile-area"> Vehicle 836,000 units ( 2014 ) </div> <div class="profile-area"> Vehicle 807,000

BeautifulSoup: Scraping different data sets having same set of attributes in the source code

浪子不回头ぞ 提交于 2019-12-24 23:24:34
问题 I'm using the BeautifulSoup module for scraping the total number of followers and total number of tweets from a Twitter account. However, when I tried inspecting the elements of the respective fields on the web page, I found that both the fields are enclosed inside same set of html attributes: Followers <a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav u-textUserColor" data-nav="followers" href="/IAmJericho/followers" data-original-title="2,469