beautifulsoup | 易学教程

How to extract the strong elements which are in div tag

阅读更多关于 How to extract the strong elements which are in div tag

问题 I am new to web scraping. I am using Python to scrape the data. Can someone help me in how to extract data from: <div class="dept"><strong>LENGTH:</strong> 15 credits</div> My output should be LENGTH: 15 credits Here is my code: from urllib.request import urlopen from bs4 import BeautifulSoup length=bsObj.findAll("strong") for leng in length: print(leng.text,leng.next_sibling) Output: DELIVERY: Campus LENGTH: 2 years OFFERED BY: Olin Business School but I would like to have only LENGTH.

Webpage values are missing while scraping data using BeautifulSoup python 3.6

阅读更多关于 Webpage values are missing while scraping data using BeautifulSoup python 3.6

问题 I am using below script to scrap "STOCK QUOTE" data from http://fortune.com/fortune500/xcel-energy/, But its giving blank. I have used selenium driver also, but same issue. Please help on this. import requests from bs4 import BeautifulSoup as bs import pandas as pd r = requests.get('http://fortune.com/fortune500/xcel-energy/') soup = bs(r.content, 'lxml') # tried: 'html.parser data = pd.DataFrame(columns=['C1','C2','C3','C4'], dtype='object', index=range(0,11)) for table in soup.find_all('div

Webpage values are missing while scraping data using BeautifulSoup python 3.6

阅读更多关于 Webpage values are missing while scraping data using BeautifulSoup python 3.6

BeautifulSoup difference between findAll and findChildren

阅读更多关于 BeautifulSoup difference between findAll and findChildren

问题 What is the difference? Don't they do the same thing - find the inside tags with given properties? 回答1: findChildren returns a resultSet just as find_all does, there is no difference in using either method as findChildren is actually find_all , if you look at the link to the source you can see: findChildren = find_all # BS2 It's there for backwards compatibility as is findAll = find_all # BS3 来源： https://stackoverflow.com/questions/38838460/beautifulsoup-difference-between-findall-and

How to web-scrape multiple page with Selenium (Python)

阅读更多关于 How to web-scrape multiple page with Selenium (Python)

问题 I've seen several solutions to scrape multiple pages from a website, but couldn't make it work on my code. At the moment, I have this code, that is working to scrape the first page. And I would like to create a loop to scrape all the page of the website (from page 1 to 5) import pandas as pd from selenium import webdriver from selenium.webdriver.chrome.options import Options from bs4 import BeautifulSoup options = Options() options.add_argument("window-size=1400,600") from fake_useragent

How to web-scrape multiple page with Selenium (Python)

阅读更多关于 How to web-scrape multiple page with Selenium (Python)

Can't remove line breaks from BeautifulSoup text output (Python 2.7.5)

阅读更多关于 Can't remove line breaks from BeautifulSoup text output (Python 2.7.5)

问题 I'm trying to write a program to parse a series of HTML files and store the resulting data in a .csv spreadsheet, which is incredibly reliant on newlines being in exactly the right place. I've tried every method I can find to strip the linebreaks away from certain pieces of text, to no avail. The relevant code looks like this: soup = BeautifulSoup(f) ID = soup.td.get_text() ID.strip() ID.rstrip() ID.replace("\t", "").replace("\r", "").replace("\n", "") dateCreated = soup.td.find_next("td")

What would be the best way to extract square meters from a string that also mentions the amount of bedrooms?

阅读更多关于 What would be the best way to extract square meters from a string that also mentions the amount of bedrooms?

问题 I'm trying to extract: <div class="xl-surface-ch"> 84 m² 2 bed. </div> from link the problem is, I only need the "84" in this string (they sometimes go over 2 or 3 digits as well). Added difficulty is that sometimes the square meters are not mentioned, which looks like this: <div class="xl-surface-ch"> 2 bed. </div> and in that case I'd need to return a 0 My best attempt is: sqm = [] for item in soup.findAll('div', attrs={'class': 'xl-surface-ch'}): item = item.contents[0].strip()[0:4]

Using python/BeautifulSoup to replace HTML tag pair with a different one

阅读更多关于 Using python/BeautifulSoup to replace HTML tag pair with a different one

问题 I need to replace a matching pair of HTML tags by another tag. Probably BeautifulSoup (4) would be suitable for the task, but I've never used it before and haven't found a suitable example anywhere, can someone give me a hint? For example, this HTML code: <font color="red">this text is red</font> Should be changed to this: <span style="color: red;">this text is red</span> The beginning and ending HTML tags may not be in the same line. 回答1: Use replace_with() to replace elements. Adapting the

Get a list of tags and get the attribute values in BeautifulSoup

阅读更多关于 Get a list of tags and get the attribute values in BeautifulSoup

问题 I'm attempting to use BeautifulSoup so get a list of HTML <div> tags, then check if they have a name attribute and then return that attribute value. Please see my code: soup = BeautifulSoup(html) #assume html contains <div> tags with a name attribute nameTags = soup.findAll('name') for n in nameTags: if n.has_key('name'): #get the value of the name attribute My question is how do I get the value of the name attribute? 回答1: Use the following code, it should work nameTags = soup.findAll('div',{