beautifulsoup

What is the correct soup.find() command?

假装没事ソ 提交于 2021-01-28 12:02:06
问题 I am trying to webscrape the racename ('The Valley R2') and the horse name ('Ronniejay') from the following website https://www.punters.com.au/form-guide/form-finder/e2a0f7e13bf0057b4c156aea23019b18. What is the correct soup.find() code to do this. My code to get the race name: from bs4 import BeautifulSoup import requests source = requests.get('https://www.punters.com.au/form-guide/form-finder/e2a0f7e13bf0057b4c156aea23019b18').text soup = BeautifulSoup(source,'lxml') race = soup.find('h3')

Scraping table with BeautifulSoup

岁酱吖の 提交于 2021-01-28 12:01:03
问题 In this first code, I can use BS to get all the info within the table of interest: from urllib import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") soup = BeautifulSoup(html) for i in soup.find("table",{"id":"giftList"}).children: print child That prints the product lists. I want to print the rows in the tournamentTable here (desired info is in class=deactivate , class=odd deactivate and date in class=center nob-border ): from urllib

Webscraping data from an interactive graph from a website

柔情痞子 提交于 2021-01-28 11:17:22
问题 I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925 I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful. I also tried to look into view-source

Storing information from td tags with a specific width, in python

喜你入骨 提交于 2021-01-28 10:25:45
问题 I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method. <a name="AAKER"> </a> <table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b> <small>(<a href="http://google.com">Soundex A260</a>) — <i>See also</i> <a href="http://google.com">ACKER</a>, <a href="http://google.com">KEAR</a>, <a href="http://google.com">TAAKE</a>. </small> </td></tr></tbody></table><br clear="all"> <table align="left"

Storing information from td tags with a specific width, in python

时光总嘲笑我的痴心妄想 提交于 2021-01-28 10:24:15
问题 I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method. <a name="AAKER"> </a> <table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b> <small>(<a href="http://google.com">Soundex A260</a>) — <i>See also</i> <a href="http://google.com">ACKER</a>, <a href="http://google.com">KEAR</a>, <a href="http://google.com">TAAKE</a>. </small> </td></tr></tbody></table><br clear="all"> <table align="left"

Split string from BeautifulSoup output in a list

隐身守侯 提交于 2021-01-28 09:35:28
问题 I have the following output from my code Code: text = soup.get_text() Output: Article Title Some text: Text blurb. More blurb. Even more blurb. Some more blurb. Second Article Title Some text: Text blurb. More blurb. Even more blurb. Some more blurb. Next, when I do test = text.splitlines() , the output changes to u'Article Title', u'', u'Some text',u'Text blurb',u'More blurb',u'Even more blurb',u'Some more blurb',, u'', u'', u'', u'', u'',u'Second Article Title', u'', u'Some text:',u'Text

Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?

怎甘沉沦 提交于 2021-01-28 09:07:52
问题 Ok guys, so I'm very much a beginner here. The purpose of what I'm trying to do is to scrape a website for company names and corresponding phone numbers. The end goal would be to write these to a CSV that can be opened with Excel. Currently I'm able to retrieve the company names, and the phone numbers, separately. I am thinking that i could merge the two lists somehow, but I'm concerned about a single outlier data offsetting the whole merge, and mismatching the numbers to names. What is the

BeautifulSoup - AttributeError: 'NavigableString' object has no attribute 'find_all'

寵の児 提交于 2021-01-28 09:01:42
问题 Trying to get this script to iterate through the html file and print out the desired results. It keeps giving me this error. It works fine with only one "game" in the table, but if it is more than one it breaks. Trying to fix it so it can iterate over more than one game/parking ticket but can't continue due to this. Traceback (most recent call last): File "C:/Users/desktop/Desktop/tabletest.py", line 11, in <module> for rows in table.find_all('tr'): File "C:\Program Files\Python36\lib\site

BeautifulSoup - AttributeError: 'NavigableString' object has no attribute 'find_all'

六眼飞鱼酱① 提交于 2021-01-28 08:42:01
问题 Trying to get this script to iterate through the html file and print out the desired results. It keeps giving me this error. It works fine with only one "game" in the table, but if it is more than one it breaks. Trying to fix it so it can iterate over more than one game/parking ticket but can't continue due to this. Traceback (most recent call last): File "C:/Users/desktop/Desktop/tabletest.py", line 11, in <module> for rows in table.find_all('tr'): File "C:\Program Files\Python36\lib\site

Reading CDATA from XML file with BeautifulSoup

大兔子大兔子 提交于 2021-01-28 08:12:31
问题 I have tweets saved in an XML file as: <tweet> <tweetid>142389495503925248</tweetid> <user>ccifuentes</user> <content><![CDATA[Salgo de #VeoTV , que día más largoooooo...]]></content> <date>2011-12-02T00:47:55</date> <lang>es</lang> <sentiments> <polarity><value>NONE</value><type>AGREEMENT</type></polarity> </sentiments> <topics> <topic>otros</topic> </topics> </tweet> To parse these, I created a BeautifulSoup instance via soup = BeautifulSoup(xml, "lxml") where xml is the raw XML file. To