beautifulsoup

Python BeautifulSoup scrape Yahoo Finance value

走远了吗. 提交于 2020-01-07 02:52:29
问题 I am attempting to scrape the 'Full Time Employees' value of 110,000 from the Yahoo finance website. The URL is: http://finance.yahoo.com/quote/AAPL/profile?p=AAPL I have tried using Beautiful soup, but I can't find the value on the page. When I look in the DOM explorer in IE, I can see it. It has a tag with a parent tag which has a parent which has a parent . The actual value is in a custom class of data-react-id . code I have tried: from bs4 import BeautifulSoup as bs html=`http://finance

Python BeautifulSoup scrape Yahoo Finance value

风格不统一 提交于 2020-01-07 02:52:06
问题 I am attempting to scrape the 'Full Time Employees' value of 110,000 from the Yahoo finance website. The URL is: http://finance.yahoo.com/quote/AAPL/profile?p=AAPL I have tried using Beautiful soup, but I can't find the value on the page. When I look in the DOM explorer in IE, I can see it. It has a tag with a parent tag which has a parent which has a parent . The actual value is in a custom class of data-react-id . code I have tried: from bs4 import BeautifulSoup as bs html=`http://finance

BeautifulSoup Scraping: loading div instead of the content

﹥>﹥吖頭↗ 提交于 2020-01-06 19:55:47
问题 Noob here. I'm trying to scrape search results from this website: http://www.mastersportal.eu/search/?q=di-4|lv-master&order=relevance I'm using python's BeautifulSoup import csv import requests from BeautifulSoup import BeautifulSoup for numb in ('0', '69'): url = ('http://www.mastersportal.eu/search/?q=ci-30,11,10,3,4,8,9,14,15,16,17,34,1,19|di-4|lv-master|rv-1&start=' + numb + '0&order=tuition_eea&direction=asc') response = requests.get(url) html = response.content soup = BeautifulSoup

Beautifulsoup unable to extract data using attrs=class

一曲冷凌霜 提交于 2020-01-06 19:43:09
问题 I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'}) in many websites but this particular website, WebSite Link doesn't return any values when I used attrs option. But when I don't use the attrs option I get entire html dom. Here is the simple code that I started with to test it out: soup = bs(urlopen(url)) for div in soup.findAll('div', attrs={'class':'data'}): print div 回答1: My code is working fine, with requests import

How to get the EXACT, REAL value of 'href'

时光毁灭记忆、已成空白 提交于 2020-01-06 16:18:09
问题 I'm trying to make a program that can fetch information about my attendance from my college website. In order to do that i wrote a script to login to the website ,which leads me to my dashboard ,and then go to the Attendence tab, get the href and attach it to url of the college website , the tag in the attendence class looked like this <a href="../Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=" id="aAttandance">Attendance</a> and when i clicked the attendance

Get value attribute for each tag found using Tag.find_all()

穿精又带淫゛_ 提交于 2020-01-06 08:45:26
问题 I've generated a list with all tags of my HTML file called 'option'. But I can't get the values inside the tag. This is my code and data: >>> soup2 = soup.findAll('option') >>> soup2 [ <option value="ufs_munic">  Por Município  </option>, <option value="ext_paises">  Por País  </option>, ... ] I'd like to get the quoted values after option value= in each tag. For example: ufs_munic ext_paises 5 6 7 8 9 ... 回答1: Using a list comprehension , you can get all the values from the options using the

Why is there a table when I scrape with beautifulSoup, but not pandas

£可爱£侵袭症+ 提交于 2020-01-06 07:21:08
问题 Trying to scrape entries on this page into a tab-delimited format (mainly pulling out the sequence and UniProt accession number). When I run: url = 'www.signalpeptide.de/index.php?sess=&m=listspdb_bacteria&s=details&id=1000&listname=' table = pd.read_html(url) print(table) I get: Traceback (most recent call last): File "scrape_signalpeptides.py", line 7, in <module> table = pd.read_html(url) File "/Users/ION/anaconda3/lib/python3.7/site-packages/pandas/io/html.py", line 1094, in read_html

Why is there a table when I scrape with beautifulSoup, but not pandas

别说谁变了你拦得住时间么 提交于 2020-01-06 07:20:59
问题 Trying to scrape entries on this page into a tab-delimited format (mainly pulling out the sequence and UniProt accession number). When I run: url = 'www.signalpeptide.de/index.php?sess=&m=listspdb_bacteria&s=details&id=1000&listname=' table = pd.read_html(url) print(table) I get: Traceback (most recent call last): File "scrape_signalpeptides.py", line 7, in <module> table = pd.read_html(url) File "/Users/ION/anaconda3/lib/python3.7/site-packages/pandas/io/html.py", line 1094, in read_html

Python scraping go to next page using BeautifulSoup [closed]

送分小仙女□ 提交于 2020-01-06 07:12:59
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . This is my scraping code: import requests from bs4 import BeautifulSoup as soup def get_emails(_links:list): for i in range(len(_links)): new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'}) if new_d: yield new_d[-1]['title'] start=20 while True: d = soup(requests.get(

By what library and how can I scrape texts on an HTML by its heading and paragraph tags?

久未见 提交于 2020-01-06 06:52:11
问题 My input will be any web documents that has no fixed HTML structure. What I want to do is to extract the texts in the heading (might be nested) and its following paragraph tags (might be multiple), and output them as pairs. A simple HTML example can be: <h1>House rule</h1> <h2>Rule 1</h2> <p>A</p> <p>B</p> <h2>Rule 2</h2> <h3>Rule 2.1</h3> <p>C</p> <h3>Rule 2.2</h3> <p>D</p> For this example, I would like to have a output of pairs: Rule 2.2, D Rule 2.1, C Rule 2, D Rule 2, C House rule, D