beautifulsoup

How to replace HTML comments with custom <comment> elements

▼魔方 西西 提交于 2020-01-13 10:27:08
问题 I'm working on mass-converting a number of HTML files to XML using BeautifulSoup in Python. A sample HTML file looks something like this: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <!-- this is an HTML comment --> <!-- this is another HTML comment --> <html xmlns="http://www.w3.org/1999/xhtml"> <head> ... <!-- here is a comment inside the head tag --> </head> <body> ... <!-- Comment inside body tag --> <!-- Another

Deep parse with beautifulsoup

本秂侑毒 提交于 2020-01-13 07:22:51
问题 I try to parse https://www.drugbank.ca/drugs. The idea is to extract all the drug names and some additional informationfor each drug. As you can see each webpage represents a table with drug names and the when we hit the drugname we can access to this drug information. Let's say I will keep the following code to handle the pagination: import requests from bs4 import BeautifulSoup def drug_data(): url = 'https://www.drugbank.ca/drugs/' while url: print(url) r = requests.get(url) soup =

Deep parse with beautifulsoup

一世执手 提交于 2020-01-13 07:21:10
问题 I try to parse https://www.drugbank.ca/drugs. The idea is to extract all the drug names and some additional informationfor each drug. As you can see each webpage represents a table with drug names and the when we hit the drugname we can access to this drug information. Let's say I will keep the following code to handle the pagination: import requests from bs4 import BeautifulSoup def drug_data(): url = 'https://www.drugbank.ca/drugs/' while url: print(url) r = requests.get(url) soup =

Python Scraper Unable to scrape img src

こ雲淡風輕ζ 提交于 2020-01-13 06:52:14
问题 I'm unable to scrape images from the website www.kissmanga.com . I'm using Python3 and the Requests and Beautifulsoup libraries. The scraped image tags give blank "src". SRC: from bs4 import BeautifulSoup import requests scraper = cfscrape.create_scraper() url = "http://kissmanga.com/Manga/Bleach/Bleach-634--Friend-004?id=235206" response = requests.get(url) soup2 = BeautifulSoup(response.text, 'html.parser') divImage = soup2.find('div',{"id": "divImage"}) for img in divImage.findAll('img'):

BeautifulSoup does not see element , even though it is present on a page

两盒软妹~` 提交于 2020-01-13 06:25:07
问题 I am trying to scrape listings from Airbnb. Every listing has its own ID. However, the output of the code below is None : import requests, bs4 response = requests.get('https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&checkout=2018-07-25&allow_override%5B%5D=&ne_lat=42.40450221314142&ne_lng=3.3245690859736214&sw_lat=41.97668610374056&sw_lng=1.7960961855829964&zoom=10&search_by_map=true&s_tag=nrGiXgWC') soup = bs4

Using BeautifulSoup to select div blocks within HTML

扶醉桌前 提交于 2020-01-13 03:04:56
问题 I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following: import urllib2 from bs4 import BeautifulSoup def getData(): html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8') soup = BeautifulSoup(html) print(soup.title) print(soup.find_all('<div class="crBlock ">')) getData()

Using BeautifulSoup to select div blocks within HTML

隐身守侯 提交于 2020-01-13 03:04:16
问题 I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following: import urllib2 from bs4 import BeautifulSoup def getData(): html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8') soup = BeautifulSoup(html) print(soup.title) print(soup.find_all('<div class="crBlock ">')) getData()

How to get multiple class in one query using Beautiful Soup

老子叫甜甜 提交于 2020-01-12 10:18:10
问题 I want to find td with class="s" or class="sb" in the following html <tr bgcolor="#e5e5f3"><td class="sb" width="200" align="left">test1</td><td class="sb" align="right">5,774.0</td><td class="sb" align="right">4,481.0</td><td class="sb" align="right">5,444.0</td><td class="sb" align="right">6,615.0</td><td class="sb" align="right">6,858.0</td></tr> <tr bgcolor="#f0f0E7"><td class="s" width="200" align="left">test2</td><td class="s" align="right">5,774.0</td><td class="s" align="right">4,481

How to get multiple class in one query using Beautiful Soup

折月煮酒 提交于 2020-01-12 10:18:05
问题 I want to find td with class="s" or class="sb" in the following html <tr bgcolor="#e5e5f3"><td class="sb" width="200" align="left">test1</td><td class="sb" align="right">5,774.0</td><td class="sb" align="right">4,481.0</td><td class="sb" align="right">5,444.0</td><td class="sb" align="right">6,615.0</td><td class="sb" align="right">6,858.0</td></tr> <tr bgcolor="#f0f0E7"><td class="s" width="200" align="left">test2</td><td class="s" align="right">5,774.0</td><td class="s" align="right">4,481

malformed start tag error - Python, BeautifulSoup, and Sipie - Ubuntu 10.04

一曲冷凌霜 提交于 2020-01-12 07:26:53
问题 I just installed python, mplayer, beautifulsoup and sipie to run Sirius on my Ubuntu 10.04 machine. I followed some docs that seem straightforward, but am encountering some issues. I'm not that familiar with Python, so this may be out of my league. I was able to get everything installed, but then running sipie gives this: /usr/bin/Sipie/Sipie/Config.py:12: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5 Traceback (most recent call last): File "/usr/bin/Sipie