how to get href link from onclick function in python

独自空忆成欢 提交于 2019-12-25 04:23:07

问题


I want to get href link of website form onclick function Here is html code in which onclick function call a website

<div class="fl">
  <span class="taLnk" onclick="ta.trackEventOnPage('Eatery_Listing', 'Website', 594024, 1); ta.util.cookie.setPIDCookie(15190); ta.call('ta.util.link.targetBlank', event, this, {'aHref':'LqMWJQzZYUWJQpEcYGII26XombQQoqnQQQQoqnqgoqnQQQQoqnQQQQoqnQQQQoqnqgoqnQQQQoqnQQuuuQQoqnQQQQoqnxioqnQQQQoqnQQ2EisSMVCnVcJQQoqnQQQQoqnxioqnQQQQoqnQQniaQQoqnQQQQoqnqgoqnQQQQoqnQQWJQzhYMJkH3KHVAdJJH3VVdB', 'isAsdf':true})">Website</span> 
</div>

Normaly i use this code to get href link from any span or element

geturl = soup.findsoup("span", {"class": "taLnk"})
for link in geturl:
  hreflink = link.get("href")
  print(hreflink)

But in this case there is no way to directly call href because href exist in onclick function

Please help me what to do now


回答1:


You cannot directly parse aHref attribute, you need to extract onclick first.

>>> import re
>>> data = soup.select('.taLnk')[0].get('onclick')
>>> href = re.search(r"(?is)'aHref':'(.*?)'",str(data)).group(1)
'LqMWJQzZYUWJQpEcYGII26XombQQoqnQQQQoqnqgoqnQQQQoqnQQQQoqnQQQQoqnqgoqnQQQQoqnQQuuuQQoqnQQQQoqnxioqnQQQQoqnQQ2EisSMVCnVcJQQoqnQQQQoqnxioqnQQQQoqnQQniaQQoqnQQQQoqnqgoqnQQQQoqnQQWJQzhYMJkH3KHVAdJJH3VVdB'



回答2:


You can use a regex with bs4, selecting the span with the class taLnk and the onclick attribute starting with ta.trackEventOnPage:

h = """<div class="fl">
  <span class="taLnk" onclick="ta.trackEventOnPage('Eatery_Listing', 'Website', 594024, 1); ta.util.cookie.setPIDCookie(15190); ta.call('ta.util.link.targetBlank', event, this, {'aHref':'LqMWJQzZYUWJQpEcYGII26XombQQoqnQQQQoqnqgoqnQQQQoqnQQQQoqnQQQQoqnqgoqnQQQQoqnQQuuuQQoqnQQQQoqnxioqnQQQQoqnQQ2EisSMVCnVcJQQoqnQQQQoqnxioqnQQQQoqnQQniaQQoqnQQQQoqnqgoqnQQQQoqnQQWJQzhYMJkH3KHVAdJJH3VVdB', 'isAsdf':true})">Website</span>
</div>"""

from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(h)

data = soup.select_one("span.taLnk[onclick^=ta.trackEventOnPage]")["onclick"]
print(re.search("'aHref':'(.*?)'", data).group(1))


来源:https://stackoverflow.com/questions/39289206/how-to-get-href-link-from-onclick-function-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!