beautifulsoup

BeautifulSoup HTML table parsing

旧巷老猫 提交于 2019-12-29 14:28:11
问题 I am trying to parse information (html tables) from this site: http://www.511virginia.org/RoadConditions.aspx?j=All&r=1 Currently I am using BeautifulSoup and the code I have looks like this from mechanize import Browser from BeautifulSoup import BeautifulSoup mech = Browser() url = "http://www.511virginia.org/RoadConditions.aspx?j=All&r=1" page = mech.open(url) html = page.read() soup = BeautifulSoup(html) table = soup.find("table") rows = table.findAll('tr')[3] cols = rows.findAll('td')

Mechanize and BeautifulSoup for PHP? [closed]

╄→гoц情女王★ 提交于 2019-12-29 14:21:53
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . I was wondering if there was anything similar like Mechanize or BeautifulSoup for PHP? 回答1: SimpleTest provides you with similar functionality: http://www.simpletest.org/en/browser_documentation.html 回答2: I don't know how powerful BeautifulSoup is, so maybe this won't be as great ; but you could try using

Selenium versus BeautifulSoup for web scraping

点点圈 提交于 2019-12-29 10:13:12
问题 I'm scraping content from a website using Python. First I used BeautifulSoup and Mechanize on Python but I saw that the website had a button that created content via JavaScript so I decided to use Selenium . Given that I can find elements and get their content using Selenium with methods like driver.find_element_by_xpath , what reason is there to use BeautifulSoup when I could just use Selenium for everything? And in this particular case, I need to use Selenium to click on the JavaScript

Web scraping with Python using BeautifulSoup 429 error

你离开我真会死。 提交于 2019-12-29 09:17:26
问题 Fist I have to say that I'm quite new to Web scraping with Python. I'm trying to scrape datas using these lines of codes import requests from bs4 import BeautifulSoup baseurl ='https://name_of_the_website.com' html_page = requests.get(baseurl).text soup = BeautifulSoup(html_page, 'html.parser') print(soup) As output I do not get the expected Html page but another Html page that says : Misbehaving Content Scraper Please use robots.txt Your IP has been rate limited To check the problem I wrote:

Changing element value with BeautifulSoup returns empty element

妖精的绣舞 提交于 2019-12-29 08:58:46
问题 from BeautifulSoup import BeautifulStoneSoup xml_data = """ <doc> <test>test</test> <foo:bar>Hello world!</foo:bar> </doc> """ soup = BeautifulStoneSoup(xml_data) print soup.prettify() make = soup.find('foo:bar') print make # prints <foo:bar>Hello world!</foo:bar> make.contents = ['Top of the world Ma!'] print make # prints <foo:bar></foo:bar> How do I change the content of the element, in this case the element in the variable "make", without loosing the content? If you could point me to

install BeautifulSoup

有些话、适合烂在心里 提交于 2019-12-29 08:17:09
问题 im running python 3.1.2 on my ubuntu 10.04 which version of BeautifulSoup i need to install and how? i already download version 3.2 and run sudo python3 setup.py install but doesnt works thnx EDIT : The error i get is : >>> import BeautifulSoup Traceback (most recent call last): File "<stdin>", line 1, in <module> File "BeautifulSoup.py", line 448 raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr) ^ SyntaxError: invalid syntax >>> 回答1: The only series

BeautifulSoup: object of type 'Response' has no len()

你离开我真会死。 提交于 2019-12-29 05:44:26
问题 Issue: when I try to execute the script, BeautifulSoup(html, ...) gives the error message "TypeError: object of type 'Response' has no len(). I tried passing the actual html as a parameter, but it still doesn't work. import requests url = 'http://vineoftheday.com/?order_by=rating' response = requests.get(url) html = response.content soup = BeautifulSoup(html, "html.parser") 回答1: You are getting response.content . But it return response body as bytes (docs). But you should pass str to

BeautifulSoup, a dictionary from an HTML table

五迷三道 提交于 2019-12-29 05:18:08
问题 I am trying to scrape table data from a website. Here is a simple example table: t = '<html><table>' +\ '<tr><td class="label"> a </td> <td> 1 </td></tr>' +\ '<tr><td class="label"> b </td> <td> 2 </td></tr>' +\ '<tr><td class="label"> c </td> <td> 3 </td></tr>' +\ '<tr><td class="label"> d </td> <td> 4 </td></tr>' +\ '</table></html>' Desired parse result is {' a ': ' 1 ', ' b ': ' 2 ', ' c ': ' 3 ', ' d ' : ' 4' } This is my closest attempt so far: for tr in s.findAll('tr'): k, v =

Python BeautifulSoup findAll by “class” attribute

≡放荡痞女 提交于 2019-12-28 20:32:54
问题 I want to do the following code, which is what BS documentation says to do, the only problem is that the word "class" isn't just a word. It can be found inside HTML, but it's also a python keyword which causes this code to throw an error. So how do I do the following? soup.findAll('ul', class="score") 回答1: Your problem seems to be that you expect find_all in the soup to find an exact match for your string. In fact: When you search for a tag that matches a certain CSS class, you’re matching

how to get tbody from table from python beautiful soup ?

我只是一个虾纸丫 提交于 2019-12-28 16:08:20
问题 I'm trying to scrap Year & Winners ( first & second columns ) from "List of finals matches" table (second table) from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals: I'm using the code below: import urllib2 from BeautifulSoup import BeautifulSoup url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm" soup = BeautifulSoup(urllib2.urlopen(url).read()) soup.findAll('table')[0].tbody.findAll('tr') for row in soup.findAll('table')[0].tbody.findAll('tr'):