beautifulsoup

Tag and string mixed find-and-replace using BeautifulSoup in python

别等时光非礼了梦想. 提交于 2020-01-15 10:59:07
问题 How do you use the BeautifulSoup .replace_with() without having something like sharp brackets being converted to > thing after a str() string conversion find-and-replace process? Python code from bs4 import BeautifulSoup with open("../dicttest.txt", "r", encoding="utf-8") as f: full_text = f.read() parse_1 = BeautifulSoup(full_text, "html.parser") for line in parse_1.find_all("grace", "AllExamples"): match = str(line).replace(";</i> <b>", ";</i><br> <b>") line.replace_with(match) print(parse

BeautifulSoup returning different html than view source

我只是一个虾纸丫 提交于 2020-01-15 08:50:14
问题 I'm brand new to using BeautifulSoup, so forgive me if my question is stupid. However, I've been googling and trying suggestions in every stackoverflow thread I could since 6am, but to no avail. My problem is that I have a .csv file with gene names, some of them are in ensEMBL format, which means I MUST use the ensembl database to lookup the info I need. For the rest I can use the ncbi database. Now, my code is just fine. I know this because every query sent to ncbi returns the info I need,

scraping multiple pages in python with BeautifulSoup

╄→尐↘猪︶ㄣ 提交于 2020-01-15 08:13:04
问题 I have managed to write code to scrape data from the first page and now the I am stuck with writing a loop in this code to scrape the next 'n' pages. Below is the code I would appreciate if someone could guide/help me to write the code that would scrape the data from remaining pages. Thanks! from bs4 import BeautifulSoup import requests import csv url = requests.get('https://wsc.nmbe.ch/search?sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt

scraping multiple pages in python with BeautifulSoup

时光怂恿深爱的人放手 提交于 2020-01-15 08:11:46
问题 I have managed to write code to scrape data from the first page and now the I am stuck with writing a loop in this code to scrape the next 'n' pages. Below is the code I would appreciate if someone could guide/help me to write the code that would scrape the data from remaining pages. Thanks! from bs4 import BeautifulSoup import requests import csv url = requests.get('https://wsc.nmbe.ch/search?sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt

scrape data from website that turned next page when scrolled to bottom using Python and BeautifulSoup

寵の児 提交于 2020-01-15 03:49:26
问题 If I need to scrape data from website that load next page automatically when one scrolled to be bottom of the page (i.e. endless extending the page) using Python and Beautiful, how can I do that? Is there a general approach or it needs to be tailored for each website? Example of website: http://statigr.am/tag/cat/#/list 回答1: If there is a dynamic behavior like loading additional content via ajax call (as it is here on statigr.am ) - you should either use a real browser with the help of

Why does BeautifulSoup .children contain nameless elements as well as the expected tag(s)

折月煮酒 提交于 2020-01-15 03:16:29
问题 Code #!/usr/bin/env python3 from bs4 import BeautifulSoup test="""<!DOCTYPE html> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/> <title>Test</title> </head> <body> <table> <tbody> <tr> <td> <div> <b> Icon </b> </div> </td> </tr> </tbody> </table> </body> </html>""" soup = BeautifulSoup(test2) rows = soup.findAll('tr') for r in rows: print(r.name) for c in r.children: print('>', c.name) Output tr > None > td > None Why are there nameless elements in the list

Can beautiful soup output be sent to browser?

依然范特西╮ 提交于 2020-01-14 19:07:56
问题 I'm pretty new to python having been introduced recently , but having most of my experience with php. One thing that php has going for it when working with HTML (not surprisingly) is that the echo statement outputs HTML to the browser. This lets you use the built in browser dev tools such as firebug. Is there a way to reroute output python/django from the command line to the browser when using tools such as beautiful soup? Ideally each run of the code would open a new browser tab. 回答1: If it

Using Beautiful Soup to get data from non-class section

主宰稳场 提交于 2020-01-14 18:40:41
问题 I am still very novice and learning python and beautiful soup. I have gotten hung up on how to get text from a non-class piece of HTML. This is the snippet of HTML I'm working with: <section class="userbody"> <script type="text/javascript"></script> <figure class="iw"> <div id="ci"> <img id="iwi" title="image 2" alt="" src="http://images.craigslist.org/00C0C_daJm4U9yU5B_600x450.jpg" style="min-width: inherit; min-height: 450px;"></img> </div> <div id="thumbs"></div> </figure> <div class=

Beautifulsoup split text in tag by <br/>

心不动则不痛 提交于 2020-01-14 08:55:08
问题 Is it possible to split a text from a tag by br tags? I have this tag contents: [u'+420 777 593 531', <br/>, u'+420 776 593 531', <br/>, u'+420 775 593 531'] And I want to get only numbers. Any advices? EDIT: [x for x in dt.find_next_sibling('dd').contents if x!=' <br/>'] Does not work at all. 回答1: You need to test for tags , which are modelled as Element instances. Element objects have a name attribute, while text elements don't (which are NavigableText instances): [x for x in dt.find_next

BeautifulSoup installation or alternative without easy_install

岁酱吖の 提交于 2020-01-14 06:18:05
问题 I wanted to write a program to scrape a website from python. Since there is no built-in possibility to do so, I decided to give the BeautifulSoup module a try. Unfortunately I encountered some problems using pip and ez_install, since I use Windows 7 64 bit and Python 3.3. Is there a way to get the BeautifulSoup module on my Python 3.3 installation with Windows 7 64x without ez_install or easy_install, since I have too much trouble with this, or is there an alternative module which can be