beautifulsoup | 易学教程

BeautifulSoup: Scrapping answers from form

阅读更多关于 BeautifulSoup: Scrapping answers from form

问题 I need to scrape the answers to the questions from the following link, including the check boxes. Here's what I have so far: from bs4 import BeautifulSoup import selenium.webdriver as webdriver url = 'https://www.adviserinfo.sec.gov/IAPD/content/viewform/adv/Sections/iapd_AdvPrivateFundReportingSection.aspx?ORG_PK=161227&FLNG_PK=05C43A1A0008018C026407B10062D49D056C8CC0' driver = webdriver.Firefox() driver.get(url) soup = BeautifulSoup(driver.page_source) The following gives me all the written

How to get text which has no HTML tag | Add multiple delimiters in split

阅读更多关于 How to get text which has no HTML tag | Add multiple delimiters in split

问题 Following XPath select div element with class ajaxcourseindentfix and split it from Prerequisite and gives me all the content after prerequisite. div = soup.select("div.ajaxcourseindentfix")[0] " ".join([word for word in div.stripped_strings]).split("Prerequisite: ")[-1] My div can have not only prerequisite but also the following splitting points: Prerequisites Corerequisite Corerequisites Now, whenever I have Prerequisite , above XPath works fine but whenever anything from above three comes

Python Scraper - Socket Error breaks script if target is 404'd

阅读更多关于 Python Scraper - Socket Error breaks script if target is 404'd

问题 Encountered an error while building a web scrapper to compile data and output into XLS format; when testing again a list of domains in which I wish to scrape from, the program faulters when it recieves a socket error. Hoping to find an 'if' statement that would null parsing a broken website and continue through my while-loop. Any ideas? workingList = xlrd.open_workbook(listSelection) workingSheet = workingList.sheet_by_index(0) destinationList = xlwt.Workbook() destinationSheet =

Downloading Images with Beautifulsoup without HTML 'img' tag

阅读更多关于 Downloading Images with Beautifulsoup without HTML 'img' tag

问题 Im using beautifulsoup to find and download images from a given website, however the website contains images which aren't in the usual <img src="icon.gif"/> format: The ones that are causing me problems for example are like this : <form action="example.jpg">  background-image:url("xine.png"); My code to find the images is: webpage = "https://example.com/images/" soup = BeautifulSoup(urlopen(webpage), "html.parser") for img in soup.find_all('img'): img_url =

Scrape a table looping in specific dates using Beautiful Soup

阅读更多关于 Scrape a table looping in specific dates using Beautiful Soup

问题 I have been driving myself up the wall with trying to scrape the necessary historical coffee prices from the table found here using BeautifulSoup: http://www.investing.com/commodities/us-coffee-c-historical-data I am trying to pull a market weeks worth of prices from 04-04-16 to 04-08-2016. My ultimate goal is to scrape the entire table for those dates. Pulling all columns from Date to Change %. My first step was to create a dictionary of the dates I want, using the date format of used in the

Displaying contents of web scrape

阅读更多关于 Displaying contents of web scrape

问题 The code below displays all the fields out onto the screen.Is there a way I could get the fields "alongside" each other as they would appear in a database or in a spreadsheet.In the source code the fields track,date,datetime,grade,distance and prizes are found in the resultsBlockHeader div class,and the Fin(finishing position) Greyhound,Trap,SP timeSec and Time Distance are found in Div resultsBlock.I am trying to get them displayed like this track,date,datetime,grade,distance,prizes,fin

How can I use BeautifulSoup to get deeply nested div values?

阅读更多关于 How can I use BeautifulSoup to get deeply nested div values?

问题 I need to get the values of deeply nested <span> elements in a DOM structure that looks like this: <div class="panda"> <div class="that"> <ul class="foo"> <li class="bar"> <div class="hi"> <p class="bye"> <span class="cheese">Cheddar</span> The problem with soup.findAll("span", {"class": "cheese"}) is that there are hundreds of span elements on the page with class "cheese" so I need to filter them by class "panda". I need to get a list of values like ["Cheddar", "Parmesan", "Swiss"] 回答1: Use

BeautifulSoup installed but not recognized when dev_appserver runs

阅读更多关于 BeautifulSoup installed but not recognized when dev_appserver runs

问题 Update By adding BeautifulSoup.py to my app source, this error was gone :) Thanks @Ned Deily, that took along time, but was fruitful Ignore from here I have just one instance of python 2.5 installed with BeautifulSoup, still no luck!, what I am I doing wrong, please help bash-3.2$ ls -ltr /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages total 1096 -rw-r--r-- 1 Harit admin 66866 May 28 2006 BeautifulSoup.py -rw-r--r-- 1 Harit admin 26413 May 28 2006

Python: BeautifulSoup extract all the span clases from div section

阅读更多关于 Python: BeautifulSoup extract all the span clases from div section

问题 from requests import get from bs4 import BeautifulSoup url = 'https://www.ceda.com.au/Events/Upcoming-events' response = get(url) events_container = html_soup.find_all('div', class_ = 'list-bx') event1name = events_container[0] print(event1name.a.text) Eventdate = html_soup.find('div', class_ = ' col-md-4 col-sm-4 side-box well side-boxTop') x = Eventdate.div.text print(x) I'm trying to print the second span class on the class " col-md-4 col-sm-4 side-box well side-boxTop" But I coud'nt able

Fetch complete List of Items using BeautifulSoup, Python 3.6

阅读更多关于 Fetch complete List of Items using BeautifulSoup, Python 3.6

问题 I am learning BeautifulSoup and I have choosen Link https://www.bundesbank.de/dynamic/action/en/statistics/time-series-databases/time-series-databases/743796/743796?treeAnchor=BANKEN&statisticType=BBK_ITS to scrape list of items for the topic "Banks and other financial corporations" I need below Items with their child items in hierarchical format as shown in attached image Banks Investment companies Insurance corporations and pension funds up to Q2 2016 Insurance corporations as of Q3 2016