beautifulsoup

How should I show results of BeautifulSoup parsing in Django?

一曲冷凌霜 提交于 2021-02-07 10:16:33
问题 I'm trying to scrape a web page using BeautifulSoup and Django. Here's my views.py which do this task: def detail(request, article_id): article = get_object_or_404(Article, pk=article_id) html = urllib2.urlopen("...url...") soup = BeautifulSoup(html) title = soup.title return render(request, 'detail.html', {'article': article, 'title':title}) But when I use {{ title }} in django template files, it doesn't show anything. I've test it and it works in shell. I've added a line to this function:

Anyway to scrape a link that redirects?

≯℡__Kan透↙ 提交于 2021-02-07 09:47:58
问题 Is there anyway that I can make python click a link such as a bit.ly link and then scrape the resulting link? When I am scraping a certain page, the only link I can scrape is a link that redirects, where it redirects to is where the information I need is located. 回答1: There are 3 types of redirections HTTP - as information in response headers (with code 301, 302, 3xx) HTML - as tag <meta> in HTML (wikipedia: Meta refresh) JavaScript - as code like window.location = new_url requests execute

Getting BeautifulSoup to catch tags in a non-case-sensitive way

人盡茶涼 提交于 2021-02-07 08:36:30
问题 I want to catch some tags with BeautifulSoup: Some <p> tags, the <title> tag, some <meta> tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META> and I want to be able to catch that. I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensitive way? 回答1: You can use soup.findAll which should match case-insensitively: import BeautifulSoup html = '''<html> <head> <meta name="description" content=

Getting BeautifulSoup to catch tags in a non-case-sensitive way

ぃ、小莉子 提交于 2021-02-07 08:35:58
问题 I want to catch some tags with BeautifulSoup: Some <p> tags, the <title> tag, some <meta> tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META> and I want to be able to catch that. I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensitive way? 回答1: You can use soup.findAll which should match case-insensitively: import BeautifulSoup html = '''<html> <head> <meta name="description" content=

Perform Download via download button in Python

无人久伴 提交于 2021-02-07 08:32:32
问题 I am somehow new in the region of getting data from a website. I have, e.g. a website http://www.ariva.de/adidas-aktie/historische_kurse and there is a donwload button hidden as shown in the picture below in red: The main question is how can I download that in python? I tried some stuff found on the web (e.g. like beautiful soup, scraperwiki etc.) but somehow failed. The data download link is structured as the following: > Kurse als CSV-Datei </h3> <div class="clearfloat"></div> </div> >

Find Most Common Words from a Website in Python 3 [closed]

旧时模样 提交于 2021-02-07 08:15:55
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 years ago . Improve this question I need to find and copy those words that appears over 5 times on a given website using Python 3 code and I'm not sure how to do it. I've looked through the archives here on stack overflow but other solutions rely on python 2 code. Here's the measly code I

Find Most Common Words from a Website in Python 3 [closed]

一笑奈何 提交于 2021-02-07 08:12:24
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 years ago . Improve this question I need to find and copy those words that appears over 5 times on a given website using Python 3 code and I'm not sure how to do it. I've looked through the archives here on stack overflow but other solutions rely on python 2 code. Here's the measly code I

How to select a class of div inside of a div with beautiful soup?

筅森魡賤 提交于 2021-02-07 05:17:12
问题 I have a bunch of div tags within div tags: <div class="foo"> <div class="bar">I want this</div> <div class="unwanted">Not this</div> </div> <div class="bar">Don't want this either </div> So I'm using python and beautiful soup to separate stuff out. I need all the "bar" class only when it is wrapped inside of a "foo" class div. Here's my code from bs4 import BeautifulSoup soup = BeautifulSoup(open(r'C:\test.htm')) tag = soup.div for each_div in soup.findAll('div',{'class':'foo'}): print(tag[

Using Beautiful Soup to strip html tags from a string

女生的网名这么多〃 提交于 2021-02-07 03:24:36
问题 Does anyone have some sample code that illustrates how to use Python's Beautiful Soup to strip all html tags, except some, from a string of text? I want to strip all javascript and html tags everything except: <a></a> <b></b> <i></i> And also things like: <a onclick=""></a> Thanks for helping -- I couldn't find much on the internet for this purpose. 回答1: import BeautifulSoup doc = '''<html><head><title>Page title</title></head><body><p id="firstpara" align="center">This is <i>paragraph</i> <a

Python, remove all html tags from string

拟墨画扇 提交于 2021-02-06 13:53:40
问题 I am trying to access the article content from a website, using beautifulsoup with the below code: site= 'www.example.com' page = urllib2.urlopen(req) soup = BeautifulSoup(page) content = soup.find_all('p') content=str(content) the content object contains all of the main text from the page that is within the 'p' tag, however there are still other tags present within the output as can be seen in the image below. I would like to remove all characters that are enclosed in matching pairs of < >