beautifulsoup | 易学教程

BeautifulSoup returning [] when I run it

阅读更多关于 BeautifulSoup returning [] when I run it

问题 I am using Beautiful soup with python to retrieve weather data from a website. Here's how the website looks like: <channel> <title>2 Hour Forecast</title> <source>Meteorological Services Singapore</source> <description>2 Hour Forecast</description> <item> <title>Nowcast Table</title> <category>Singapore Weather Conditions</category> <forecastIssue date="18-07-2016" time="03:30 PM"/> <validTime>3.30 pm to 5.30 pm</validTime> <weatherForecast> <area forecast="TL" lat="1.37500000" lon="103

parsing HTML table with BeautifulSoup4

阅读更多关于 parsing HTML table with BeautifulSoup4

问题 I am new to BeautifulSoup and trying to extract the table. I have followed documentation to do a nested for loop to extract the cell data but it only returns the first three rows. Here is my code: from six.moves import urllib from bs4 import BeautifulSoup import pandas as pd def get_url_content(url): try: html=urllib.request.urlopen(url) except urllib.error.HTTPError as e: return None try: soup=BeautifulSoup(html.read(),'html.parser') except AttributeError as e: return None return soup URL=

Download all csv files from URL

阅读更多关于 Download all csv files from URL

问题 I want to download all csv files, any idea how I do this? from bs4 import BeautifulSoup import requests url = requests.get('http://www.football-data.co.uk/englandm.php').text soup = BeautifulSoup(url) for link in soup.findAll("a"): print link.get("href") 回答1: You just need to filter the hrefs which you can do with a css selector , a[href$=.csv] which will find the href's ending in .csv then join each to the base url, request and finally write the content: from bs4 import BeautifulSoup import

How to use request library to send keys to web page in Python?

阅读更多关于 How to use request library to send keys to web page in Python?

问题 I have a website https://www.icsi.in/student/Members/MemberSearch.aspx which when visited, I've to enter the ' CP number ' as 16803 , & click on search. After that information of student displays which I need to scrap. Can someone please help how to pass the ' CP number ' to request & how to press the ' search ' button using request. So far I've tried using the class name & id name as well in param tag of request.get() method. import requests r=requests.get('https://www.icsi.in/student

Identify and replace elements of XML using BeautifulSoup in Python

阅读更多关于 Identify and replace elements of XML using BeautifulSoup in Python

问题 I am trying to use BeautifulSoup4 to find and replace specific elements within an XML. More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable. My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method. The XML <ParameterGroup name

Python: Saving BeautifulSoup Output to Text file

阅读更多关于 Python: Saving BeautifulSoup Output to Text file

问题 This question is a follow-up to this question here. I am parsing a txt file in Python as follow: text = open("C:\\Users\\0001193125-13-416534.txt") soup = BeautifulSoup(text.read().lower()) for type_tag in soup.find_all('TYPE', text=re.compile('^\s*(?:EX|XML)', re.I)): type_tag.extract() Once I have extracted all the <TYPE> tags from soup , how can I save the ouptut to a txt file? I have tried: with io.open("C:\\Output.txt", 'a', encoding='utf8') as logfile: for tr in soup.find_all('tr')[2:]:

Python: Saving BeautifulSoup Output to Text file

阅读更多关于 Python: Saving BeautifulSoup Output to Text file

Web scraping of Yahoo Finance statistics using BS4

阅读更多关于 Web scraping of Yahoo Finance statistics using BS4

问题 I am new to Python programming, but I have found some different code snippets and have compiled them into the code underneath. The Python script are returning all the right HTML values, from the summary array but no values from the statistics array, because the values don't get matches. I don't know how to extract the values on the statistics pane on Yahoo Finance. Its referred to as url2, and key_stats_on_stat. I hope you are willing to help me out. import os, sys import csv from bs4 import

How to remove content in nested tags with BeautifulSoup?

阅读更多关于 How to remove content in nested tags with BeautifulSoup?

问题 How to remove content in nested tags with BeautifulSoup ? These posts showed the reverse to retrieve the content in nested tags: How to get contents of nested tag using BeautifulSoup, and BeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s? I have tried .text but it only removes the tags >>> from bs4 import BeautifulSoup as bs >>> html = "<foo>Something something <bar> blah blah</bar> something</foo>" >>> bs(html).find_all('foo')[0] <foo

Python Beautifulsoup Getting Attribute Value

阅读更多关于 Python Beautifulsoup Getting Attribute Value

问题 I'm having difficulty getting the proper syntax to extract the value of an attribute in Beautifulsoup with HTML 5.0. So I've isolated the occurrence of a tag in my soup using the proper syntax where there is an HTML 5 issue: tags = soup.find_all(attrs={"data-topic":"recUpgrade"}) Taking just tags[1]: date = tags[1].find(attrs={"data-datenews":True}) and date here is: <span class="invisible" data-datenews="2018-05-25 06:02:19" data-idnews="2736625" id="horaCompleta"></span> But now I want to