beautifulsoup

BeautifulSoup: Scrapping answers from form

泪湿孤枕 提交于 2019-12-23 03:45:10
问题 I need to scrape the answers to the questions from the following link, including the check boxes. Here's what I have so far: from bs4 import BeautifulSoup import selenium.webdriver as webdriver url = 'https://www.adviserinfo.sec.gov/IAPD/content/viewform/adv/Sections/iapd_AdvPrivateFundReportingSection.aspx?ORG_PK=161227&FLNG_PK=05C43A1A0008018C026407B10062D49D056C8CC0' driver = webdriver.Firefox() driver.get(url) soup = BeautifulSoup(driver.page_source) The following gives me all the written

How to get text which has no HTML tag | Add multiple delimiters in split

旧城冷巷雨未停 提交于 2019-12-23 03:36:10
问题 Following XPath select div element with class ajaxcourseindentfix and split it from Prerequisite and gives me all the content after prerequisite. div = soup.select("div.ajaxcourseindentfix")[0] " ".join([word for word in div.stripped_strings]).split("Prerequisite: ")[-1] My div can have not only prerequisite but also the following splitting points: Prerequisites Corerequisite Corerequisites Now, whenever I have Prerequisite , above XPath works fine but whenever anything from above three comes

Python Scraper - Socket Error breaks script if target is 404'd

扶醉桌前 提交于 2019-12-23 03:22:40
问题 Encountered an error while building a web scrapper to compile data and output into XLS format; when testing again a list of domains in which I wish to scrape from, the program faulters when it recieves a socket error. Hoping to find an 'if' statement that would null parsing a broken website and continue through my while-loop. Any ideas? workingList = xlrd.open_workbook(listSelection) workingSheet = workingList.sheet_by_index(0) destinationList = xlwt.Workbook() destinationSheet =

Downloading Images with Beautifulsoup without HTML 'img' tag

好久不见. 提交于 2019-12-23 03:22:23
问题 Im using beautifulsoup to find and download images from a given website, however the website contains images which aren't in the usual <img src="icon.gif"/> format: The ones that are causing me problems for example are like this : <form action="example.jpg"> <!-- <img src="big.jpg" /> --> background-image:url("xine.png"); My code to find the images is: webpage = "https://example.com/images/" soup = BeautifulSoup(urlopen(webpage), "html.parser") for img in soup.find_all('img'): img_url =

Scrape a table looping in specific dates using Beautiful Soup

半腔热情 提交于 2019-12-23 02:53:11
问题 I have been driving myself up the wall with trying to scrape the necessary historical coffee prices from the table found here using BeautifulSoup: http://www.investing.com/commodities/us-coffee-c-historical-data I am trying to pull a market weeks worth of prices from 04-04-16 to 04-08-2016. My ultimate goal is to scrape the entire table for those dates. Pulling all columns from Date to Change %. My first step was to create a dictionary of the dates I want, using the date format of used in the

Displaying contents of web scrape

半城伤御伤魂 提交于 2019-12-23 02:39:31
问题 The code below displays all the fields out onto the screen.Is there a way I could get the fields "alongside" each other as they would appear in a database or in a spreadsheet.In the source code the fields track,date,datetime,grade,distance and prizes are found in the resultsBlockHeader div class,and the Fin(finishing position) Greyhound,Trap,SP timeSec and Time Distance are found in Div resultsBlock.I am trying to get them displayed like this track,date,datetime,grade,distance,prizes,fin

How can I use BeautifulSoup to get deeply nested div values?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-23 02:19:27
问题 I need to get the values of deeply nested <span> elements in a DOM structure that looks like this: <div class="panda"> <div class="that"> <ul class="foo"> <li class="bar"> <div class="hi"> <p class="bye"> <span class="cheese">Cheddar</span> The problem with soup.findAll("span", {"class": "cheese"}) is that there are hundreds of span elements on the page with class "cheese" so I need to filter them by class "panda". I need to get a list of values like ["Cheddar", "Parmesan", "Swiss"] 回答1: Use

BeautifulSoup installed but not recognized when dev_appserver runs

百般思念 提交于 2019-12-23 02:09:10
问题 Update By adding BeautifulSoup.py to my app source, this error was gone :) Thanks @Ned Deily, that took along time, but was fruitful Ignore from here I have just one instance of python 2.5 installed with BeautifulSoup, still no luck!, what I am I doing wrong, please help bash-3.2$ ls -ltr /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages total 1096 -rw-r--r-- 1 Harit admin 66866 May 28 2006 BeautifulSoup.py -rw-r--r-- 1 Harit admin 26413 May 28 2006

Python: BeautifulSoup extract all the span clases from div section

帅比萌擦擦* 提交于 2019-12-23 01:58:10
问题 from requests import get from bs4 import BeautifulSoup url = 'https://www.ceda.com.au/Events/Upcoming-events' response = get(url) events_container = html_soup.find_all('div', class_ = 'list-bx') event1name = events_container[0] print(event1name.a.text) Eventdate = html_soup.find('div', class_ = ' col-md-4 col-sm-4 side-box well side-boxTop') x = Eventdate.div.text print(x) I'm trying to print the second span class on the class " col-md-4 col-sm-4 side-box well side-boxTop" But I coud'nt able

Fetch complete List of Items using BeautifulSoup, Python 3.6

≡放荡痞女 提交于 2019-12-23 01:51:24
问题 I am learning BeautifulSoup and I have choosen Link https://www.bundesbank.de/dynamic/action/en/statistics/time-series-databases/time-series-databases/743796/743796?treeAnchor=BANKEN&statisticType=BBK_ITS to scrape list of items for the topic "Banks and other financial corporations" I need below Items with their child items in hierarchical format as shown in attached image Banks Investment companies Insurance corporations and pension funds up to Q2 2016 Insurance corporations as of Q3 2016