Python how to add exception?

删除回忆录丶 提交于 2019-12-13 04:32:13

问题


@martineau I have updated my codes, is this what you meant ? How do i handle KeyError instead of NameError ?

url = "http://app2.nea.gov.sg/anti-pollution-radiation-protection/air-pollution/psi/psi-readings-over-the-last-24-hours"
web_soup = soup(urllib2.urlopen(url))

table = web_soup.find(name="div", attrs={'class': 'c1'}).find_all(name="div")[4].find_all('table')[0]

data = {}
cur_time = datetime.datetime.strptime("12AM", "%I%p")
for tr_index, tr in enumerate(table.find_all('tr')):
    if 'Time' in tr.text:
        continue
    for td_index, td in enumerate(tr.find_all('td')):
        if not td_index:
            continue
        data[cur_time] = td.text.strip()

        if td.find('strong'):
            bold_time = cur_time
            data[bold_time] = '20'
        cur_time += datetime.timedelta(hours=1)

        default_value = '20' # whatever you want it to be

    try:
        bold = data[bold_time]
    except NameError:

        bold_time = beforebold = beforebeforebold = default_value
    # might want to set "bold" to something, too, if needed
    else:   
        beforebold = data.get(bold_time - datetime.timedelta(hours=1)) 
        beforebeforebold =  data.get(bold_time - datetime.timedelta(hours=2))

This is where I print my data to do calculation.

print bold
print beforebold
print beforebeforebold

回答1:


You need to add something to set data[bold_time]:

    if td.find('strong'):
        bold_time = cur_time
        data[bold_time] = ????? # whatever it should be
    cur_time += datetime.timedelta(hours=1)

This should avoid both the NameError and KeyError exceptions as long as the word strong is found. You still might want to code defensively and handle one or both of them gracefully. That what exception where meant to do, handle those exceptional cases that shouldn't happen...




回答2:


I had read your previous post before it disappeared, and then I've read this one.
I find it a pity to use BeautifulSoup for your goal, because, from the code I see, I find its use complicated, and the fact is that regexes run roughly 10 times faster than BeautifulSoup.

Here's the code with only re, that furnishes the data you are interested in.
I know, there will people to say that HTML text can't be parsed by regexs. I know, I know... but I don't parse the text, I directly find the chunks of text that are interesting. The source code of the webpage of this site is apparently very well structured and it seems there is little risk of bugs. Moreover, tests and verification can be added to keep watch on the source code and to be instantly informed on the possible changings made by the webmaster in the webpage

import re
from httplib import HTTPConnection

hypr = HTTPConnection(host='app2.nea.gov.sg',
                      timeout = 300)
rekete = ('/anti-pollution-radiation-protection/'
          'air-pollution/psi/'
          'psi-readings-over-the-last-24-hours')

hypr.request('GET',rekete)
page = hypr.getresponse().read()


patime = ('PSI Readings.+?'
          'width="\d+%" align="center">\r\n'
          ' *<strong>Time</strong>\r\n'
          ' *</td>\r\n'
          '((?: *<td width="\d+%" align="center">'
          '<strong>\d+AM</strong>\r\n'
          ' *</td>\r\n)+.+?)'

          'width="\d+%" align="center">\r\n'
          ' *<strong>Time</strong>\r\n'
          ' *</td>\r\n'
          '((?: *<td width="\d+%" align="center">'
          '<strong>\d+PM</strong>\r\n'
          ' *</td>\r\n)+.+?)'
          'PM2.5 Concentration')
rgxtime = re.compile(patime,re.DOTALL)


patline = ('<td align="center">\r\n'
           ' *<strong>'             # next line = group 1
           '(North|South|East|West|Central|Overall Singapore)'
           '</strong>\r\n'
           ' *</td>\r\n'
           '((?: *<td align="center">\r\n'  # group 2 start
           ' *[.\d-]+\r\n'                  #
           ' *</td>\r\n)*)'                 # group 2 end

           ' *<td align="center">\r\n'
           ' *<strong style[^>]+>'
           '([.\d-]+)' # group 3
           '</strong>\r\n'
           ' *</td>\r\n')
rgxline = re.compile(patline)

rgxnb = re.compile('<td align="center">\r\n'
                   ' *([.\d-]+)\r\n'
                   ' *</td>\r\n')


m= rgxtime.search(page)

a,b = m.span(1) # m.group(1) contains the data AM
d = dict((mat.group(1),
          rgxnb.findall(mat.group(2))+[mat.group(3)])
         for mat in rgxline.finditer(page[a:b]))

a,b = m.span(2) # m.group(2) contains the data PM
for mat in rgxline.finditer(page[a:b]):
    d[mat.group(1)].extend(rgxnb.findall(mat.group(2))+[mat.group(3)])


print 'last 3 values'
for k,v in d.iteritems():
    print '%s  :  %s' % (k,v[-3:])


来源:https://stackoverflow.com/questions/17505511/python-how-to-add-exception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!