beautifulsoup | 易学教程

How should I show results of BeautifulSoup parsing in Django?

阅读更多关于 How should I show results of BeautifulSoup parsing in Django?

问题 I'm trying to scrape a web page using BeautifulSoup and Django. Here's my views.py which do this task: def detail(request, article_id): article = get_object_or_404(Article, pk=article_id) html = urllib2.urlopen("...url...") soup = BeautifulSoup(html) title = soup.title return render(request, 'detail.html', {'article': article, 'title':title}) But when I use {{ title }} in django template files, it doesn't show anything. I've test it and it works in shell. I've added a line to this function:

Anyway to scrape a link that redirects?

阅读更多关于 Anyway to scrape a link that redirects?

问题 Is there anyway that I can make python click a link such as a bit.ly link and then scrape the resulting link? When I am scraping a certain page, the only link I can scrape is a link that redirects, where it redirects to is where the information I need is located. 回答1: There are 3 types of redirections HTTP - as information in response headers (with code 301, 302, 3xx) HTML - as tag <meta> in HTML (wikipedia: Meta refresh) JavaScript - as code like window.location = new_url requests execute

Getting BeautifulSoup to catch tags in a non-case-sensitive way

阅读更多关于 Getting BeautifulSoup to catch tags in a non-case-sensitive way

问题 I want to catch some tags with BeautifulSoup: Some <p> tags, the <title> tag, some <meta> tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META> and I want to be able to catch that. I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensitive way? 回答1: You can use soup.findAll which should match case-insensitively: import BeautifulSoup html = '''<html> <head> <meta name="description" content=

Getting BeautifulSoup to catch tags in a non-case-sensitive way

阅读更多关于 Getting BeautifulSoup to catch tags in a non-case-sensitive way

Perform Download via download button in Python

阅读更多关于 Perform Download via download button in Python

问题 I am somehow new in the region of getting data from a website. I have, e.g. a website http://www.ariva.de/adidas-aktie/historische_kurse and there is a donwload button hidden as shown in the picture below in red: The main question is how can I download that in python? I tried some stuff found on the web (e.g. like beautiful soup, scraperwiki etc.) but somehow failed. The data download link is structured as the following: > Kurse als CSV-Datei </h3> <div class="clearfloat"></div> </div> >

Find Most Common Words from a Website in Python 3 [closed]

阅读更多关于 Find Most Common Words from a Website in Python 3 [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 years ago . Improve this question I need to find and copy those words that appears over 5 times on a given website using Python 3 code and I'm not sure how to do it. I've looked through the archives here on stack overflow but other solutions rely on python 2 code. Here's the measly code I

Find Most Common Words from a Website in Python 3 [closed]

阅读更多关于 Find Most Common Words from a Website in Python 3 [closed]

How to select a class of div inside of a div with beautiful soup?

阅读更多关于 How to select a class of div inside of a div with beautiful soup?

问题 I have a bunch of div tags within div tags: <div class="foo"> <div class="bar">I want this</div> <div class="unwanted">Not this</div> </div> <div class="bar">Don't want this either </div> So I'm using python and beautiful soup to separate stuff out. I need all the "bar" class only when it is wrapped inside of a "foo" class div. Here's my code from bs4 import BeautifulSoup soup = BeautifulSoup(open(r'C:\test.htm')) tag = soup.div for each_div in soup.findAll('div',{'class':'foo'}): print(tag[

Using Beautiful Soup to strip html tags from a string

阅读更多关于 Using Beautiful Soup to strip html tags from a string

问题 Does anyone have some sample code that illustrates how to use Python's Beautiful Soup to strip all html tags, except some, from a string of text? I want to strip all javascript and html tags everything except: <a></a> <b></b> <i></i> And also things like: <a onclick=""></a> Thanks for helping -- I couldn't find much on the internet for this purpose. 回答1: import BeautifulSoup doc = '''<html><head><title>Page title</title></head><body><p id="firstpara" align="center">This is <i>paragraph</i> <a

Python, remove all html tags from string

阅读更多关于 Python, remove all html tags from string

问题 I am trying to access the article content from a website, using beautifulsoup with the below code: site= 'www.example.com' page = urllib2.urlopen(req) soup = BeautifulSoup(page) content = soup.find_all('p') content=str(content) the content object contains all of the main text from the page that is within the 'p' tag, however there are still other tags present within the output as can be seen in the image below. I would like to remove all characters that are enclosed in matching pairs of < >