Scraping data from the tag names in python

孤街浪徒 提交于 2019-12-20 04:15:55

问题


Hi I am trying to scrape user data from a website. I need User ID which are available in the tag names itself.I am trying to scrape the UID using python selenium and beautiful soup in the div tag.

Example:

<"div id="UID_**60CE07D6DF5C02A987ED7B076F4154F3**-SRC_328619641" class="memberOverlayLink" onmouseover="ta.trackEventOnPage('Reviews','show_reviewer_info_window','user_name_photo'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', 0, (new Element(this)).getElement('.avatar')&amp;&amp;(new Element(this)).getElement('.avatar').getStyle('border-radius')=='100%'?-10:0);">

I am trying to scrape the UID using python selenium and beautiful soup in the div tag . I have looked through all the documentation and several web pages but I can't find a solution for this. If anyone can please tell me if such a thing is possible I would be very grateful.


回答1:


Assuming the id attribute value is always in the format UID_ followed by one or more alphanumeric characters followed by -SRC_ followed by one or more digits:

import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

pattern = re.compile(r"UID_(\w+)\-SRC_\d+")
id = soup.find("div", id=pattern)["id"]

uid = pattern.match(id).group(1)
print(uid)

Here we are using BeautifulSoup and searching for an id attribute value to match a specific regular expression. It contains a saving group (\w+) that helps us to extract the UID value.



来源:https://stackoverflow.com/questions/33973629/scraping-data-from-the-tag-names-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!