How to extract onClick url using beautifulsoup

*爱你&永不变心* 提交于 2020-05-28 03:07:01

问题


Below is the HTML code which needs extraction

<div class="one_block" style="display:block;" onClick="location.href=\'/games/box.html
?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020\';" style="cursor:pointer;">
<!-- \xe5\xb0\x8d\xe6\x88\xb0\xe7\x90\x83\xe9\x9a\x8
a\xe5\x8f\x8a\xe5\xa0\xb4\xe5\x9c\xb0 start -->
<table width="100%" border="0" cellspacing="0" cellpadding="0" class="schedule_team">
<tr>

How do I get the location.href value?

Tried:

soup.findAll("div", {"onClick": "location.href"})

Returns null

Desired Output:

/games/box.html?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020

PS: there's plenty of location.href


回答1:


How about using .select() method for SoupSieve package to run a CSS selector

from bs4 import BeautifulSoup

html = '<div class="one_block" style="display:block;" onClick="location.href=\'/games/box.html?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020\';" style="cursor:pointer;">' \
        '<!-- \xe5\xb0\x8d\xe6\x88\xb0\xe7\x90\x83\xe9\x9a\x8a\xe5\x8f\x8a\xe5\xa0\xb4\xe5\x9c\xb0 start -->' \
        '<table width="100%" border="0" cellspacing="0" cellpadding="0" class="schedule_team"><tr>'

soup = BeautifulSoup(html, features="lxml")
element = soup.select('div.one_block')[0]
print(element.get('onclick'))

Use split to get just print(element.get('onclick').split("'")[1])

/games/box.html?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020


来源:https://stackoverflow.com/questions/61465565/how-to-extract-onclick-url-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!