Cannot find table using Python BeautifulSoup

巧了我就是萌 提交于 2019-12-13 02:55:44

问题


I am trying to scrape the data from the table id=AWS from the following NOAA site, https://www.weather.gov/afc/alaskaObs, but when I try to find the table using '.find' my result comes up as none. I am able to return the parent div, but can't seem to access the table. Below is my code.

from bs4 import BeautifulSoup
from urllib2 import urlopen

# Get soup set up
html = urlopen('https://www.weather.gov/afc/alaskaObs').read()
soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table", {"id":"AWS"})


print soup

When I try to just find the parent div, "obDataDiv", it returns the following.

<div id="obDataDiv"> </div>

I'm pretty new to BeautifulSoup, is this an error? Any help is appreciated, thank you!


回答1:


urlopen will only give you the DOM that was downloaded from the server, not what it ends up being after running initial client-side scripts. In the case of your example site, the table is Javascript-generated after the page load. So you'll need to use PhantomJS, Selenium, etc to let the necessary client-side JS run first.




回答2:


It seems the div you extract contains just one table. So why not do something like this:

soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table")


来源:https://stackoverflow.com/questions/45072879/cannot-find-table-using-python-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!