beautifulsoup

BeautifulSoup findall with class attribute- unicode encode error

…衆ロ難τιáo~ 提交于 2019-12-21 12:28:27
问题 I am using BeautifulSoup to extract news stories(just the titles) from Hacker News and have this much up till now- import urllib2 from BeautifulSoup import BeautifulSoup HN_url = "http://news.ycombinator.com" def get_page(): page_html = urllib2.urlopen(HN_url) return page_html def get_stories(content): soup = BeautifulSoup(content) titles_html =[] for td in soup.findAll("td", { "class":"title" }): titles_html += td.findAll("a") return titles_html print get_stories(get_page() ) When I run the

python BeautifulSoup get select.value not text

左心房为你撑大大i 提交于 2019-12-21 11:56:12
问题 <select> <option value="0">2002/12</option> <option value="1">2003/12</option> <option value="2">2004/12</option> <option value="3">2005/12</option> <option value="4">2006/12</option> <option value="5" selected>2007/12</option> </select> with this code, I need value as '0' not text as '2002/12' I tried a lot of BS4 options, .stripped_strings , .strip() , .contents , get() , etc. How I can get values not text? 回答1: You want the value attribute ; access tag attributes using mapping syntax:

python BeautifulSoup get select.value not text

六眼飞鱼酱① 提交于 2019-12-21 11:55:08
问题 <select> <option value="0">2002/12</option> <option value="1">2003/12</option> <option value="2">2004/12</option> <option value="3">2005/12</option> <option value="4">2006/12</option> <option value="5" selected>2007/12</option> </select> with this code, I need value as '0' not text as '2002/12' I tried a lot of BS4 options, .stripped_strings , .strip() , .contents , get() , etc. How I can get values not text? 回答1: You want the value attribute ; access tag attributes using mapping syntax:

Python won't write to file

喜夏-厌秋 提交于 2019-12-21 08:24:38
问题 I am attempting to write a pretty printed email to a .txt file so i can better view what I want to parse out of it. Here is this section of my code: result, data = mail.uid('search', None, "(FROM 'tiffany@e.tiffany.com')") # search and return uids instead latest_email_uid = data[0].split()[-1] result, data = mail.uid('fetch', latest_email_uid, '(RFC822)') raw_email = data[0][1] html = raw_email soup = BS(html) pretty_email = soup.prettify('utf-8') f = open("da_email.txt", "w") f.write(pretty

Problems Parsing NBA Boxscore Data with BeautifulSoup

北慕城南 提交于 2019-12-21 06:58:00
问题 I am trying to parse player level NBA boxscore data from EPSN. The following is the initial portion of my attempt: import numpy as np import pandas as pd import requests from bs4 import BeautifulSoup from datetime import datetime, date request = requests.get('http://espn.go.com/nba/boxscore?gameId=400277722') soup = BeautifulSoup(request.text,'html.parser') table = soup.find_all('table') It seems that BeautifulSoup is giving me a strange result. The last 'table' in the source code contains

Web Scraping using Python giving HTTP Error 404: Not Found

半世苍凉 提交于 2019-12-21 06:35:31
问题 I am brand new to Python and have not very good at it. I am trying to web scrape from a website called Transfermarkt (I'm a big football fan) but its giving me HTTP Error 404 when I try extract data. Here is my code: from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = "https://www.transfermarkt.com/chelsea-fc/leihspielerhistorie/verein/631/plus/1?saison_id=2018&leihe=ist" uClient = uReq(my_url) page_html = uClient.read() uClient.close() page_soup = soup

Web Scraping using Python giving HTTP Error 404: Not Found

穿精又带淫゛_ 提交于 2019-12-21 06:34:48
问题 I am brand new to Python and have not very good at it. I am trying to web scrape from a website called Transfermarkt (I'm a big football fan) but its giving me HTTP Error 404 when I try extract data. Here is my code: from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = "https://www.transfermarkt.com/chelsea-fc/leihspielerhistorie/verein/631/plus/1?saison_id=2018&leihe=ist" uClient = uReq(my_url) page_html = uClient.read() uClient.close() page_soup = soup

Web Scraping using Python giving HTTP Error 404: Not Found

心已入冬 提交于 2019-12-21 06:34:48
问题 I am brand new to Python and have not very good at it. I am trying to web scrape from a website called Transfermarkt (I'm a big football fan) but its giving me HTTP Error 404 when I try extract data. Here is my code: from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = "https://www.transfermarkt.com/chelsea-fc/leihspielerhistorie/verein/631/plus/1?saison_id=2018&leihe=ist" uClient = uReq(my_url) page_html = uClient.read() uClient.close() page_soup = soup

scraping data from a dynamic graph using python+beautifulSoup4

六月ゝ 毕业季﹏ 提交于 2019-12-21 06:29:27
问题 I need to implement a data scraping task and extract data from a dynamic graph. The graph is update with time similar to what you would find if you look at the graph of a company's stock. I am using the requests and beautifulsoup4 library in python but I have only figured out how to scrape text and links data. Can't seem to figure out how i can get the values of the graph into a csv file The graph in question can be found at - http://www.apptrace.com/app/instagram/id389801252/ranks

scraping data from a dynamic graph using python+beautifulSoup4

耗尽温柔 提交于 2019-12-21 06:29:04
问题 I need to implement a data scraping task and extract data from a dynamic graph. The graph is update with time similar to what you would find if you look at the graph of a company's stock. I am using the requests and beautifulsoup4 library in python but I have only figured out how to scrape text and links data. Can't seem to figure out how i can get the values of the graph into a csv file The graph in question can be found at - http://www.apptrace.com/app/instagram/id389801252/ranks