beautifulsoup | 易学教程

Python BeautifulSoup scrape Yahoo Finance value

阅读更多关于 Python BeautifulSoup scrape Yahoo Finance value

问题 I am attempting to scrape the 'Full Time Employees' value of 110,000 from the Yahoo finance website. The URL is: http://finance.yahoo.com/quote/AAPL/profile?p=AAPL I have tried using Beautiful soup, but I can't find the value on the page. When I look in the DOM explorer in IE, I can see it. It has a tag with a parent tag which has a parent which has a parent . The actual value is in a custom class of data-react-id . code I have tried: from bs4 import BeautifulSoup as bs html=`http://finance

Python BeautifulSoup scrape Yahoo Finance value

阅读更多关于 Python BeautifulSoup scrape Yahoo Finance value

BeautifulSoup Scraping: loading div instead of the content

阅读更多关于 BeautifulSoup Scraping: loading div instead of the content

问题 Noob here. I'm trying to scrape search results from this website: http://www.mastersportal.eu/search/?q=di-4|lv-master&order=relevance I'm using python's BeautifulSoup import csv import requests from BeautifulSoup import BeautifulSoup for numb in ('0', '69'): url = ('http://www.mastersportal.eu/search/?q=ci-30,11,10,3,4,8,9,14,15,16,17,34,1,19|di-4|lv-master|rv-1&start=' + numb + '0&order=tuition_eea&direction=asc') response = requests.get(url) html = response.content soup = BeautifulSoup

Beautifulsoup unable to extract data using attrs=class

阅读更多关于 Beautifulsoup unable to extract data using attrs=class

问题 I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'}) in many websites but this particular website, WebSite Link doesn't return any values when I used attrs option. But when I don't use the attrs option I get entire html dom. Here is the simple code that I started with to test it out: soup = bs(urlopen(url)) for div in soup.findAll('div', attrs={'class':'data'}): print div 回答1: My code is working fine, with requests import

How to get the EXACT, REAL value of 'href'

阅读更多关于 How to get the EXACT, REAL value of 'href'

问题 I'm trying to make a program that can fetch information about my attendance from my college website. In order to do that i wrote a script to login to the website ,which leads me to my dashboard ,and then go to the Attendence tab, get the href and attach it to url of the college website , the tag in the attendence class looked like this <a href="../Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=" id="aAttandance">Attendance</a> and when i clicked the attendance

Get value attribute for each tag found using Tag.find_all()

阅读更多关于 Get value attribute for each tag found using Tag.find_all()

问题 I've generated a list with all tags of my HTML file called 'option'. But I can't get the values inside the tag. This is my code and data: >>> soup2 = soup.findAll('option') >>> soup2 [ <option value="ufs_munic"> Por Município </option>, <option value="ext_paises"> Por País </option>, ... ] I'd like to get the quoted values after option value= in each tag. For example: ufs_munic ext_paises 5 6 7 8 9 ... 回答1: Using a list comprehension , you can get all the values from the options using the

Why is there a table when I scrape with beautifulSoup, but not pandas

阅读更多关于 Why is there a table when I scrape with beautifulSoup, but not pandas

问题 Trying to scrape entries on this page into a tab-delimited format (mainly pulling out the sequence and UniProt accession number). When I run: url = 'www.signalpeptide.de/index.php?sess=&m=listspdb_bacteria&s=details&id=1000&listname=' table = pd.read_html(url) print(table) I get: Traceback (most recent call last): File "scrape_signalpeptides.py", line 7, in <module> table = pd.read_html(url) File "/Users/ION/anaconda3/lib/python3.7/site-packages/pandas/io/html.py", line 1094, in read_html

Why is there a table when I scrape with beautifulSoup, but not pandas

阅读更多关于 Why is there a table when I scrape with beautifulSoup, but not pandas

Python scraping go to next page using BeautifulSoup [closed]

阅读更多关于 Python scraping go to next page using BeautifulSoup [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . This is my scraping code: import requests from bs4 import BeautifulSoup as soup def get_emails(_links:list): for i in range(len(_links)): new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'}) if new_d: yield new_d[-1]['title'] start=20 while True: d = soup(requests.get(

By what library and how can I scrape texts on an HTML by its heading and paragraph tags?

阅读更多关于 By what library and how can I scrape texts on an HTML by its heading and paragraph tags?

问题 My input will be any web documents that has no fixed HTML structure. What I want to do is to extract the texts in the heading (might be nested) and its following paragraph tags (might be multiple), and output them as pairs. A simple HTML example can be: <h1>House rule</h1> <h2>Rule 1</h2> <p>A</p> <p>B</p> <h2>Rule 2</h2> <h3>Rule 2.1</h3> <p>C</p> <h3>Rule 2.2</h3> <p>D</p> For this example, I would like to have a output of pairs: Rule 2.2, D Rule 2.1, C Rule 2, D Rule 2, C House rule, D