BeautifulSoup returning [] when I run it

↘锁芯ラ 提交于 2020-01-05 05:55:51

问题


I am using Beautiful soup with python to retrieve weather data from a website.

Here's how the website looks like:

<channel>
<title>2 Hour Forecast</title>
<source>Meteorological Services Singapore</source>
<description>2 Hour Forecast</description>
<item>
<title>Nowcast Table</title>
<category>Singapore Weather Conditions</category>
<forecastIssue date="18-07-2016" time="03:30 PM"/>
<validTime>3.30 pm to 5.30 pm</validTime>
<weatherForecast>
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>` 
<channel>

I would like to retrieve 3.30 pm to 5.30 pm which is between validTime

After inspecting elements from the page, I found that 3.30 pm to 5.30 pm is in the "class = Text" within the <span> element:

Based on the webiste, here are my python codes:

import requests
from bs4 import BeautifulSoup

url = "http://www.nea.gov.sg/api/WebAPI/?dataset=2hr_nowcast&keyref=<keyrefnumber>"

r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

g_data = soup.find_all("span", {"class": "text"})

print g_data

# to print out the file in 3.30pm to 5:30pm to an XML file
outfile = open('C:\scripts\idk.xml','w')

When I run my python codes in CMD, all I got was [].


回答1:


The main API page on the Singapore NEA site shows clearly that the response you get is an XML document:

2-hour Nowcast
Data Description: Weather forecast for next 2 hours
Last API Update: 1-Mar-2016
Frequency Hourly
File Type: XML

You are looking at a HTML representation of the data in Chrome; Chrome transformed the XML to make it presentable in some way, but your Python code is still accessing the XML directly. The PDF documentation and your own question show the actual XML contents, parse those.

If you want to use BeautifulSoup with XML, make sure you have the lxml project installed and use the 'xml' parser type. Then simply access the text content of the validTime element:

soup = BeautifulSoup(r.content, "xml")
valid_time = soup.find('validTime').string

Demo:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('http://www.nea.gov.sg/api/WebAPI/?dataset=2hr_nowcast&keyref=<private_api_key>')
>>> soup = BeautifulSoup(r.content, "xml")
>>> soup.find('validTime').string
u'4.00 pm to 6.00 pm'

If you are trying to write to an XML file, you'd have to make sure it is writing valid XML however; this is outside the scope of BeautifulSoup.

Alternatively, use the ElementTree API that comes with Python by default; it can both parse the XML and produce new XML.



来源:https://stackoverflow.com/questions/38431446/beautifulsoup-returning-when-i-run-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!