beautifulsoup

Using findAll in BS4 to create list

天大地大妈咪最大 提交于 2019-12-24 05:48:32
问题 I'll start by saying I'm sort of new with Python. I've been working on a Slack bot recently and here's where I'm at so far. source = requests.get(url).content soup = BeautifulSoup(source, 'html.parser') price = soup.findAll("a", {"class":"pricing"})["quantity"] Here is the HTML code I am trying to scrape. <a class="pricing" saleprice="240.00" quantity="1" added="2017-01-01"> S </a> <a class="pricing" saleprice="21.00" quantity="5" added="2017-03-14"> M </a> <a class="pricing" saleprice="139

empty result set beautiful soup

限于喜欢 提交于 2019-12-24 05:46:16
问题 Scraping article from New York Times site and getting an empty result set. My aim is to get the urls and the text of the h3 items. When I run this I get an empty set. Printing the section scrape shows I'm on the right path... target url - http://query.nytimes.com/search/sitesearch/?action=click&contentCollection&region=TopBar&WT.nav=searchWidget&module=SearchSubmit&pgtype=sectionfront#/san+diego/24hours url = "http://query.nytimes.com/search/sitesearch/?action=click&contentCollection&region

How to click on elements using selenium driver?

╄→尐↘猪︶ㄣ 提交于 2019-12-24 05:25:41
问题 I have been trying to scrape a web page of bookmyshow site using selenium. When the page is loaded, 2 popups will come. In those two we have to click on the required buttons to close them. When i try to find those elements, I am getting error. I made the driver to load the page completely by using sleep(). But still, I am unable to do so. The code is: from bs4 import BeautifulSoup import requests from selenium import webdriver from time import sleep s = requests.session() driver = webdriver

How do you find all list items between two tags with BeautifulSoup?

大憨熊 提交于 2019-12-24 04:21:17
问题 For example, I'd like to pull out only Child1, Child2, and Child3 out of the below list where it is after the first instance of h3 and before the next tag of h3 <h3>HeaderName1<h3> <ul class="prodoplist"> <li>Parent</li> <li class="lev1">Child1</li> <li class="lev1">Child2</li> <li class="lev1">Child3</li> </ul> <h3>HeaderName2<h3> <ul class="prodoplist"> <li>Parent2</li> <li class="lev1">Child4</li> <li class="lev1">Child5</li> <li class="lev1">Child6</li> </ul> 回答1: using findChildren like:

How do you find all list items between two tags with BeautifulSoup?

ぐ巨炮叔叔 提交于 2019-12-24 04:21:17
问题 For example, I'd like to pull out only Child1, Child2, and Child3 out of the below list where it is after the first instance of h3 and before the next tag of h3 <h3>HeaderName1<h3> <ul class="prodoplist"> <li>Parent</li> <li class="lev1">Child1</li> <li class="lev1">Child2</li> <li class="lev1">Child3</li> </ul> <h3>HeaderName2<h3> <ul class="prodoplist"> <li>Parent2</li> <li class="lev1">Child4</li> <li class="lev1">Child5</li> <li class="lev1">Child6</li> </ul> 回答1: using findChildren like:

Python mechanize login Facebook, use beautifulshoup get profile picture failed

不羁岁月 提交于 2019-12-24 03:53:32
问题 I'm trying to keep login Facebook using Python mechanize library and get Zuck's profile picture url using BeautifulSoup. Here is my code: import cookielib import mechanize from BeautifulSoup import BeautifulSoup # Browser br = mechanize.Browser() # Enable cookie support for urllib2 cookiejar = cookielib.LWPCookieJar() br.set_cookiejar(cookiejar) # Broser options br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots

Unable to get actual Markup from a page with BeautifulSoup

爷,独闯天下 提交于 2019-12-24 03:34:15
问题 I am trying to scrape this URL with combination of BeautifulSoup and Selinium http://starwood.ugc.bazaarvoice.com/3523si-en_us/115/reviews.djs?format=embeddedhtml&page=2&scrollToTop=true I have tried this code active_review_page_html = browser.page_source active_review_page_html = active_review_page_html.replace('\\', "") hotel_page_soup = BeautifulSoup(active_review_page_html) print(hotel_page_soup) But what is does that it is returning me data like ;<span class="BVRRReviewText">Hotel

Matching specific table within HTML, BeautifulSoup

时光总嘲笑我的痴心妄想 提交于 2019-12-24 03:31:20
问题 I have this problem. There're several similar tables on the page I'm trying to scrape. <h2 class="tabellen_ueberschrift al">Points</h2> <div class="fl" style="width:49%;"> <table class="tabelle_grafik lh" cellpadding="2" cellspacing="1"> The only difference between them is the text within h2 tags, here: Points How can I specifiy which table I need to search in? I have this code and need to adjust the h2 tag factor: my_tab = soup.find('table', {'class':'tabelle_grafik lh'}) Need some help guys

TypeError : 'NoneType' object not callable when using split in Python with BeautifulSoup

守給你的承諾、 提交于 2019-12-24 03:16:14
问题 I was playing around with the BeautifulSoup and Requests APIs today. So I thought I would write a simple scraper that would follow links to a depth of 2(if that makes sense). All the links in the webpage that i am scraping are relative. (For eg: <a href="/free-man-aman-sethi/books/9788184001341.htm" title="A Free Man"> ) So to make them absolute I thought I would join the page url with the relative links using urljoin . To do this I had to first extract the href value from the <a> tags and

Using BeautifulSoup on very large HTML file - memory error?

丶灬走出姿态 提交于 2019-12-24 03:15:53
问题 I'm learning Python by working on a project - a Facebook message analyzer. I downloaded my data, which includes a messages.htm file of all my messages. I'm trying to write a program to parse this file and output data (# of messages, most common words, etc.) However, my messages.htm file is 270MB. When creating a BeautifulSoup object in the shell for testing, any other file (all < 1MB) works just fine. But I can't create a bs object of messages.htm. Here's the error: >>> mf = open('messages