beautifulsoup

BeautifulSoup returning [] when I run it

↘锁芯ラ 提交于 2020-01-05 05:55:51
问题 I am using Beautiful soup with python to retrieve weather data from a website. Here's how the website looks like: <channel> <title>2 Hour Forecast</title> <source>Meteorological Services Singapore</source> <description>2 Hour Forecast</description> <item> <title>Nowcast Table</title> <category>Singapore Weather Conditions</category> <forecastIssue date="18-07-2016" time="03:30 PM"/> <validTime>3.30 pm to 5.30 pm</validTime> <weatherForecast> <area forecast="TL" lat="1.37500000" lon="103

parsing HTML table with BeautifulSoup4

依然范特西╮ 提交于 2020-01-05 04:06:12
问题 I am new to BeautifulSoup and trying to extract the table. I have followed documentation to do a nested for loop to extract the cell data but it only returns the first three rows. Here is my code: from six.moves import urllib from bs4 import BeautifulSoup import pandas as pd def get_url_content(url): try: html=urllib.request.urlopen(url) except urllib.error.HTTPError as e: return None try: soup=BeautifulSoup(html.read(),'html.parser') except AttributeError as e: return None return soup URL=

Download all csv files from URL

久未见 提交于 2020-01-05 03:50:12
问题 I want to download all csv files, any idea how I do this? from bs4 import BeautifulSoup import requests url = requests.get('http://www.football-data.co.uk/englandm.php').text soup = BeautifulSoup(url) for link in soup.findAll("a"): print link.get("href") 回答1: You just need to filter the hrefs which you can do with a css selector , a[href$=.csv] which will find the href's ending in .csv then join each to the base url, request and finally write the content: from bs4 import BeautifulSoup import

How to use request library to send keys to web page in Python?

孤街浪徒 提交于 2020-01-04 15:15:20
问题 I have a website https://www.icsi.in/student/Members/MemberSearch.aspx which when visited, I've to enter the ' CP number ' as 16803 , & click on search. After that information of student displays which I need to scrap. Can someone please help how to pass the ' CP number ' to request & how to press the ' search ' button using request. So far I've tried using the class name & id name as well in param tag of request.get() method. import requests r=requests.get('https://www.icsi.in/student

Identify and replace elements of XML using BeautifulSoup in Python

南楼画角 提交于 2020-01-04 14:19:36
问题 I am trying to use BeautifulSoup4 to find and replace specific elements within an XML. More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable. My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method. The XML <ParameterGroup name

Python: Saving BeautifulSoup Output to Text file

拟墨画扇 提交于 2020-01-04 11:43:03
问题 This question is a follow-up to this question here. I am parsing a txt file in Python as follow: text = open("C:\\Users\\0001193125-13-416534.txt") soup = BeautifulSoup(text.read().lower()) for type_tag in soup.find_all('TYPE', text=re.compile('^\s*(?:EX|XML)', re.I)): type_tag.extract() Once I have extracted all the <TYPE> tags from soup , how can I save the ouptut to a txt file? I have tried: with io.open("C:\\Output.txt", 'a', encoding='utf8') as logfile: for tr in soup.find_all('tr')[2:]:

Python: Saving BeautifulSoup Output to Text file

一个人想着一个人 提交于 2020-01-04 11:42:01
问题 This question is a follow-up to this question here. I am parsing a txt file in Python as follow: text = open("C:\\Users\\0001193125-13-416534.txt") soup = BeautifulSoup(text.read().lower()) for type_tag in soup.find_all('TYPE', text=re.compile('^\s*(?:EX|XML)', re.I)): type_tag.extract() Once I have extracted all the <TYPE> tags from soup , how can I save the ouptut to a txt file? I have tried: with io.open("C:\\Output.txt", 'a', encoding='utf8') as logfile: for tr in soup.find_all('tr')[2:]:

Web scraping of Yahoo Finance statistics using BS4

≯℡__Kan透↙ 提交于 2020-01-04 10:58:27
问题 I am new to Python programming, but I have found some different code snippets and have compiled them into the code underneath. The Python script are returning all the right HTML values, from the summary array but no values from the statistics array, because the values don't get matches. I don't know how to extract the values on the statistics pane on Yahoo Finance. Its referred to as url2, and key_stats_on_stat. I hope you are willing to help me out. import os, sys import csv from bs4 import

How to remove content in nested tags with BeautifulSoup?

狂风中的少年 提交于 2020-01-04 09:26:10
问题 How to remove content in nested tags with BeautifulSoup ? These posts showed the reverse to retrieve the content in nested tags: How to get contents of nested tag using BeautifulSoup, and BeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s? I have tried .text but it only removes the tags >>> from bs4 import BeautifulSoup as bs >>> html = "<foo>Something something <bar> blah blah</bar> something</foo>" >>> bs(html).find_all('foo')[0] <foo

Python Beautifulsoup Getting Attribute Value

家住魔仙堡 提交于 2020-01-04 08:37:06
问题 I'm having difficulty getting the proper syntax to extract the value of an attribute in Beautifulsoup with HTML 5.0. So I've isolated the occurrence of a tag in my soup using the proper syntax where there is an HTML 5 issue: tags = soup.find_all(attrs={"data-topic":"recUpgrade"}) Taking just tags[1]: date = tags[1].find(attrs={"data-datenews":True}) and date here is: <span class="invisible" data-datenews="2018-05-25 06:02:19" data-idnews="2736625" id="horaCompleta"></span> But now I want to