Using BeautifulSoup to extract the title of a link

后端 未结 2 1803
小蘑菇
小蘑菇 2020-12-18 01:18

I\'m trying to extract the title of a link using BeautifulSoup. The code that I\'m working with is as follows:

url = \"http://www.example.com\"
source_code =         


        
2条回答
  •  旧巷少年郎
    2020-12-18 02:09

    You are searching for an exact string here, by using multiple classes. In that case the class string has to match exactly, with single spaces.

    See the Searching by CSS class section in the documentation:

    You can also search for the exact string value of the class attribute:

    css_soup.find_all("p", class_="body strikeout")
    # [

    ]

    But searching for variants of the string value won’t work:

    css_soup.find_all("p", class_="strikeout body")
    # []
    

    You'd have a better time searching for individual classes:

    soup.find_all('a', class_='a-link-normal')
    

    If you must match more than one class, use a CSS selector:

    soup.select('a.a-link-normal.s-access-detail-page.a-text-normal')
    

    and it won't matter in what order you list the classes.

    Demo:

    >>> from bs4 import BeautifulSoup
    >>> plain_text = u'

    Introduction To Computation And Programming Using Python

    ' >>> soup = BeautifulSoup(plain_text) >>> for link in soup.find_all('a', class_='a-link-normal'): ... print link.text ... Introduction To Computation And Programming Using Python >>> for link in soup.select('a.a-link-normal.s-access-detail-page.a-text-normal'): ... print link.text ... Introduction To Computation And Programming Using Python

提交回复
热议问题