beautifulsoup | 易学教程

How to replace HTML comments with custom <comment> elements

阅读更多关于 How to replace HTML comments with custom elements

问题 I'm working on mass-converting a number of HTML files to XML using BeautifulSoup in Python. A sample HTML file looks something like this: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">   <html xmlns="http://www.w3.org/1999/xhtml"> <head> ...  </head> <body> ...  <!-- Another

Deep parse with beautifulsoup

阅读更多关于 Deep parse with beautifulsoup

问题 I try to parse https://www.drugbank.ca/drugs. The idea is to extract all the drug names and some additional informationfor each drug. As you can see each webpage represents a table with drug names and the when we hit the drugname we can access to this drug information. Let's say I will keep the following code to handle the pagination: import requests from bs4 import BeautifulSoup def drug_data(): url = 'https://www.drugbank.ca/drugs/' while url: print(url) r = requests.get(url) soup =

Deep parse with beautifulsoup

阅读更多关于 Deep parse with beautifulsoup

Python Scraper Unable to scrape img src

阅读更多关于 Python Scraper Unable to scrape img src

问题 I'm unable to scrape images from the website www.kissmanga.com . I'm using Python3 and the Requests and Beautifulsoup libraries. The scraped image tags give blank "src". SRC: from bs4 import BeautifulSoup import requests scraper = cfscrape.create_scraper() url = "http://kissmanga.com/Manga/Bleach/Bleach-634--Friend-004?id=235206" response = requests.get(url) soup2 = BeautifulSoup(response.text, 'html.parser') divImage = soup2.find('div',{"id": "divImage"}) for img in divImage.findAll('img'):

BeautifulSoup does not see element , even though it is present on a page

阅读更多关于 BeautifulSoup does not see element , even though it is present on a page

问题 I am trying to scrape listings from Airbnb. Every listing has its own ID. However, the output of the code below is None : import requests, bs4 response = requests.get('https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&checkout=2018-07-25&allow_override%5B%5D=&ne_lat=42.40450221314142&ne_lng=3.3245690859736214&sw_lat=41.97668610374056&sw_lng=1.7960961855829964&zoom=10&search_by_map=true&s_tag=nrGiXgWC') soup = bs4

Using BeautifulSoup to select div blocks within HTML

阅读更多关于 Using BeautifulSoup to select div blocks within HTML

问题 I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following: import urllib2 from bs4 import BeautifulSoup def getData(): html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8') soup = BeautifulSoup(html) print(soup.title) print(soup.find_all('<div class="crBlock ">')) getData()

Using BeautifulSoup to select div blocks within HTML

阅读更多关于 Using BeautifulSoup to select div blocks within HTML

How to get multiple class in one query using Beautiful Soup

阅读更多关于 How to get multiple class in one query using Beautiful Soup

问题 I want to find td with class="s" or class="sb" in the following html <tr bgcolor="#e5e5f3"><td class="sb" width="200" align="left">test1</td><td class="sb" align="right">5,774.0</td><td class="sb" align="right">4,481.0</td><td class="sb" align="right">5,444.0</td><td class="sb" align="right">6,615.0</td><td class="sb" align="right">6,858.0</td></tr> <tr bgcolor="#f0f0E7"><td class="s" width="200" align="left">test2</td><td class="s" align="right">5,774.0</td><td class="s" align="right">4,481

How to get multiple class in one query using Beautiful Soup

阅读更多关于 How to get multiple class in one query using Beautiful Soup

malformed start tag error - Python, BeautifulSoup, and Sipie - Ubuntu 10.04

阅读更多关于 malformed start tag error - Python, BeautifulSoup, and Sipie - Ubuntu 10.04

问题 I just installed python, mplayer, beautifulsoup and sipie to run Sirius on my Ubuntu 10.04 machine. I followed some docs that seem straightforward, but am encountering some issues. I'm not that familiar with Python, so this may be out of my league. I was able to get everything installed, but then running sipie gives this: /usr/bin/Sipie/Sipie/Config.py:12: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5 Traceback (most recent call last): File "/usr/bin/Sipie