beautifulsoup | 易学教程

Python click button on alert

阅读更多关于 Python click button on alert

问题 I am new to python, but need to modify code created by someone else. I am not able to post the full code, but I posted most of it below: from bs4 import BeautifulSoup import datetime import getpass from gmail import Gmail from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from selenium.common.exceptions import ElementNotVisibleException from time import sleep from selenium.common.exceptions import NoAlertPresentException from selenium.webdriver

BeautifulSoup find_all() returns no data

阅读更多关于 BeautifulSoup find_all() returns no data

问题 I am very new to Python. My recent project is scraping data from a betting website. What I want to scrape is the odds information from the webpage. Here is my code from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'http://bet.hkjc.com/default.aspx?url=football/odds/odds_allodds.aspx&lang=CH&tmatchid=120653' uClient = uReq(my_url) page_html = uClient.read() uClient.close() page_soup = soup(page_html, "html.parser") page_soup.findAll("div",{"class":

UnicodeError: URL contains non-ASCII characters (Python 2.7)

阅读更多关于 UnicodeError: URL contains non-ASCII characters (Python 2.7)

问题 So I've managed to make a crawler, and I'm searchng for all links and when I arrive at a product link I make some finds and I take all product information, but when it arrives to certain page it gives a unicode error :/ import urllib import urlparse from itertools import ifilterfalse from urllib2 import URLError, HTTPError from bs4 import BeautifulSoup urls = ["http://www.kiabi.es/"] visited = [] def get_html_text(url): try: return urllib.urlopen(current_url).read() except (URLError,

Python, beautiful soup, get all class name

阅读更多关于 Python, beautiful soup, get all class name

问题 given an html code lets say: <div class="class1"> <span class="class2">some text</span> <span class="class3">some text</span> <span class="class4">some text</span> </div> How can I retrieve all the class names? ie: ['class1','class2','class3','class4'] I tried: soup.find_all(class_=True) But it retrieves the whole tag and i then need to do some regex on the string 回答1: You can treat each Tag instance found as a dictionary when it comes to retrieving attributes. Note that class attribute value

BeautifulSoup extracting data from multiple tables

阅读更多关于 BeautifulSoup extracting data from multiple tables

问题 I'm trying to extract some data from two html tables in a html file with BeautifulSoup. This is actually the first time I'm using it and I'searched a lot of questions/example but none seem to work in my case. The html contains two tables, the first with the headers of the first column (which are always text) and the second, containing the data of the following columns. Moreover, the table contains text, numbers and also symbols. This makes for a novice like me everything more complicated.

Removing new line characters in web scrape

阅读更多关于 Removing new line characters in web scrape

问题 I'm trying to scrape baseball lineup data but would only like to return the player names. However, as of right now, it is giving me - position, newline character, name, newline character, and then batting side. For example I want 'D. Fletcher' but instead I get 'LF\nD. Fletcher\nR' Additionally, it is giving me all players on the page. It would be preferable that I group them by team, which maybe requires a dictionary set up of some sort but am not sure what that code would look like. I've

Using beautifulsoup get_text()

阅读更多关于 Using beautifulsoup get_text()

问题 I can parse the field that I need from a website with this code block: response = requests.get(index_url) soup = bs4.BeautifulSoup(response.text, "lxml") poem = soup.select('div.siir p[id^=siir]') print poem But it prints with HTML tags. I'm trying to use get_text() function. When I try to use like this: print poem.get_text() I get this error: AttributeError: 'list' object has no attribute 'get_text' I also tried to use like this: poem = soup.select('div.siir p[id^=siir]').get_text() I get

Python Beautiful Soup can't find specific table

阅读更多关于 Python Beautiful Soup can't find specific table

问题 I'm having issues with scraping basketball-reference.com. I'm trying to access the "Team Per Game Stats" table but can't seem to target the correct div/table. I'm trying to capture the table and bring it into a dataframe using pandas. I've tried using soup.find and soup.find_all to find a all the tables but when I search the results I do not see the ID of the table I am looking for. See below. x = soup.find("table", id="team-stats-per_game") import csv, time, sys, math import numpy as np

python BeautifulSoup get specific element

阅读更多关于 python BeautifulSoup get specific element

问题 if i have an html code like this <div class="new_info_next"> <input type="hidden" value="133" id="new_id" class="new_id"> <input type="hidden" value="0" id="default_pe" class="default_pe"> </div> and i want to get only 133 in input the first line i try this code using BeautifulSoup4 info = soup.find_all("div", {"class": "new_info_next"}) for inpu in info: for inpu1 in inpu.select('input'): print inpu1 .get('value') but the output was 133 0 how to get only 133 回答1: Since you only want the

Scraping a table using BeautifulSoup

阅读更多关于 Scraping a table using BeautifulSoup

问题 I have a question which i suspect is fairly straight forward. I have the following type of page from which I want to collect the information in the last table (if you scroll all the way down it is the one in the box labelled "Procedure"): http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-2&language=EN The html for the table I want to scrape looks like this: <tbody><tr class="doc_title"> <td style="background-image: url("/img/struct/navigation/gradient_blue