beautifulsoup | 易学教程

What is the correct soup.find() command?

阅读更多关于 What is the correct soup.find() command?

问题 I am trying to webscrape the racename ('The Valley R2') and the horse name ('Ronniejay') from the following website https://www.punters.com.au/form-guide/form-finder/e2a0f7e13bf0057b4c156aea23019b18. What is the correct soup.find() code to do this. My code to get the race name: from bs4 import BeautifulSoup import requests source = requests.get('https://www.punters.com.au/form-guide/form-finder/e2a0f7e13bf0057b4c156aea23019b18').text soup = BeautifulSoup(source,'lxml') race = soup.find('h3')

Scraping table with BeautifulSoup

阅读更多关于 Scraping table with BeautifulSoup

问题 In this first code, I can use BS to get all the info within the table of interest: from urllib import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") soup = BeautifulSoup(html) for i in soup.find("table",{"id":"giftList"}).children: print child That prints the product lists. I want to print the rows in the tournamentTable here (desired info is in class=deactivate , class=odd deactivate and date in class=center nob-border ): from urllib

Webscraping data from an interactive graph from a website

阅读更多关于 Webscraping data from an interactive graph from a website

问题 I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925 I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful. I also tried to look into view-source

Storing information from td tags with a specific width, in python

阅读更多关于 Storing information from td tags with a specific width, in python

问题 I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method. <a name="AAKER"> </a> <table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b> <small>(<a href="http://google.com">Soundex A260</a>) — <i>See also</i> <a href="http://google.com">ACKER</a>, <a href="http://google.com">KEAR</a>, <a href="http://google.com">TAAKE</a>. </small> </td></tr></tbody></table><br clear="all"> <table align="left"

Storing information from td tags with a specific width, in python

阅读更多关于 Storing information from td tags with a specific width, in python

Split string from BeautifulSoup output in a list

阅读更多关于 Split string from BeautifulSoup output in a list

问题 I have the following output from my code Code: text = soup.get_text() Output: Article Title Some text: Text blurb. More blurb. Even more blurb. Some more blurb. Second Article Title Some text: Text blurb. More blurb. Even more blurb. Some more blurb. Next, when I do test = text.splitlines() , the output changes to u'Article Title', u'', u'Some text',u'Text blurb',u'More blurb',u'Even more blurb',u'Some more blurb',, u'', u'', u'', u'', u'',u'Second Article Title', u'', u'Some text:',u'Text

Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?

阅读更多关于 Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?

问题 Ok guys, so I'm very much a beginner here. The purpose of what I'm trying to do is to scrape a website for company names and corresponding phone numbers. The end goal would be to write these to a CSV that can be opened with Excel. Currently I'm able to retrieve the company names, and the phone numbers, separately. I am thinking that i could merge the two lists somehow, but I'm concerned about a single outlier data offsetting the whole merge, and mismatching the numbers to names. What is the

BeautifulSoup - AttributeError: 'NavigableString' object has no attribute 'find_all'

阅读更多关于 BeautifulSoup - AttributeError: 'NavigableString' object has no attribute 'find_all'

问题 Trying to get this script to iterate through the html file and print out the desired results. It keeps giving me this error. It works fine with only one "game" in the table, but if it is more than one it breaks. Trying to fix it so it can iterate over more than one game/parking ticket but can't continue due to this. Traceback (most recent call last): File "C:/Users/desktop/Desktop/tabletest.py", line 11, in <module> for rows in table.find_all('tr'): File "C:\Program Files\Python36\lib\site

BeautifulSoup - AttributeError: 'NavigableString' object has no attribute 'find_all'

阅读更多关于 BeautifulSoup - AttributeError: 'NavigableString' object has no attribute 'find_all'

Reading CDATA from XML file with BeautifulSoup

阅读更多关于 Reading CDATA from XML file with BeautifulSoup

问题 I have tweets saved in an XML file as: <tweet> <tweetid>142389495503925248</tweetid> <user>ccifuentes</user> <content><![CDATA[Salgo de #VeoTV , que día más largoooooo...]]></content> <date>2011-12-02T00:47:55</date> <lang>es</lang> <sentiments> <polarity><value>NONE</value><type>AGREEMENT</type></polarity> </sentiments> <topics> <topic>otros</topic> </topics> </tweet> To parse these, I created a BeautifulSoup instance via soup = BeautifulSoup(xml, "lxml") where xml is the raw XML file. To