Extract attribute value in beautiful soup

泪湿孤枕 提交于 2019-12-24 13:25:33

问题


The following is part of a website that I am trying to extract the video titles from:

</div>
<div class="yt-lockup-content">
        <h3 class="yt-lockup-title">
<a class="yt-uix-sessionlink yt-uix-tile-link yt-uix-contextlink 
      yt-ui-ellipsis yt-ui-ellipsis-2"
    dir="ltr"
      title="Harder Polynomials"
    data-sessionlink="ei=fYsHUvSLA8uzigLq74CABQ&amp;ved=CB8Qvxs&amp;feature=c4-videos-u"
    href="/watch?v=LHvQeBRLFn8"
  >
    Harder Polynomials
</a>

I wish to extract the video title (Harder Polynomials) from this. I have tried the following code:

import requests
from bs4 import BeutifulSoup

resp=requests.get('http://www.youtube.com/user/sachinabey/videos')

a=soup.findAll('a', attrs={'class': 'yt-uix-sessionlink yt-uix-tile-link yt-uix-  contextlink yt-ui-ellipsis yt-ui-ellipsis-2'})

a is empty, what am I doing wrong. From here how do I extract the title


回答1:


Here's a working solution that prints all video titles from the page:

import requests
from bs4 import BeautifulSoup

resp = requests.get('http://www.youtube.com/user/sachinabey/videos')

soup = BeautifulSoup(resp.text)
for title in soup.findAll('h3', attrs={'class': 'yt-lockup-title'}):
    print title.find('a').text.strip()

It prints:

Harder Polynomials
Summing tan inverse
iGraph tutorial
Integrate e^(-x^2)
Chord of Contact to Ellipse
Equation of Tangents of an Ellipse or Hyperbola
Motion and Air Resistance
Projectile Motion
Regression in R
EM Algorithm Derivation
Cosine Rule proof
R writing functions
Proof of square root 2 being irrational
R for loops and while loops
Chi Squared Hypothesis Testing
Integration of Trignometric Functions
Sequences and Series Examples
ARCH GARCH Model Motivation
Integration by Parts
Differentiate Inverse Trigonometry
Simple Harmonic Motion Examples Part II
Simple Harmonic Motion Examples Part I
Simple Harmonic Motion -  Introduction
HSC Solutions 2009 3 Unit Q4
HSC 3 Unit Solutions 2009 Q2
HSC 3 Unit Maths 2009 Solutions
Parallel For Loops
Change of Base for Logarithms - Examples
Divisibility by 3 or 9
Multiplying by 11



回答2:


I think the error lies in yt-uix- contextlink. I think this should be a typo. If corrects it, it works.

Demo:

>>> s
'<div class="yt-lockup-content">\n        <h3 class="yt-lockup-title">\n<a class="yt-uix-sessionlink yt-uix-tile-link yt-uix-contextlink \n      yt-ui-ellipsis yt-ui-ellipsis-2"\n    dir="ltr"\n      title="Harder Polynomials"\n    data-sessionlink="ei=fYsHUvSLA8uzigLq74CABQ&amp;ved=CB8Qvxs&amp;feature=c4-videos-u"\n    href="/watch?v=LHvQeBRLFn8"\n  >\n    Harder Polynomials\n</a>'
>>> soup=BeautifulSoup(s)
>>> soup.findAll('a', attrs={'class': 'yt-uix-sessionlink yt-uix-tile-link yt-uix-contextlink yt-ui-ellipsis yt-ui-ellipsis-2'})
[<a class="yt-uix-sessionlink yt-uix-tile-link yt-uix-contextlink yt-ui-ellipsis yt-ui-ellipsis-2" data-sessionlink="ei=fYsHUvSLA8uzigLq74CABQ&amp;ved=CB8Qvxs&amp;feature=c4-videos-u" dir="ltr" href="/watch?v=LHvQeBRLFn8" title="Harder Polynomials">
    Harder Polynomials
</a>]

Or maybe you can pass in a list of the classes.

>>> soup.findAll('a', attrs={'class': ['yt-uix-sessionlink', 'yt-uix-tile-link', 'yt-uix-contextlink',  'yt-ui-ellipsis yt-ui-ellipsis-2']})
[<a class="yt-uix-sessionlink yt-uix-tile-link yt-uix-contextlink yt-ui-ellipsis yt-ui-ellipsis-2" data-sessionlink="ei=fYsHUvSLA8uzigLq74CABQ&amp;ved=CB8Qvxs&amp;feature=c4-videos-u" dir="ltr" href="/watch?v=LHvQeBRLFn8" title="Harder Polynomials">
    Harder Polynomials
</a>]


来源:https://stackoverflow.com/questions/18172644/extract-attribute-value-in-beautiful-soup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!