Retrieving a subset of href's from findall() in BeautifulSoup

一曲冷凌霜 提交于 2019-12-06 15:43:48

Use the regex module to match only the links you want.

import requests
# The Requests library.

from bs4 import BeautifulSoup
from lxml import html
from re import compile

user_input = input("Enter Artist Name = ").replace(" ","+")
base_url = "https://genius.com/search?q="+user_input

header = {'User-Agent':''}
response = requests.get(base_url, headers=header)

soup = BeautifulSoup(response.content, "lxml")

pattern = re.compile("[\S]+-lyrics$")

for link in soup.find_all('a',href=True):
    if pattern.match(link['href']):
        print (link['href'])

Output:

https://genius.com/Drake-hotline-bling-lyrics
https://genius.com/Drake-one-dance-lyrics
https://genius.com/Drake-hold-on-were-going-home-lyrics
https://genius.com/Drake-know-yourself-lyrics
https://genius.com/Drake-back-to-back-lyrics
https://genius.com/Drake-all-me-lyrics
https://genius.com/Drake-0-to-100-the-catch-up-lyrics
https://genius.com/Drake-started-from-the-bottom-lyrics
https://genius.com/Drake-from-time-lyrics
https://genius.com/Drake-the-motto-lyrics

This just looks if your link matches the pattern ending in -lyrics. You may use similar logic to filter using user_input variable as well.

Hope this helps.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!