How do I scrape ::before element in a website using selenium python

后端 未结 2 587
没有蜡笔的小新
没有蜡笔的小新 2021-01-05 18:26

I am trying to scrape phone number from this website using selenium. I found the class to be \"tel ttel\" but when I try to scrape the website by find_element_by_xpath. I ge

2条回答
  •  耶瑟儿~
    2021-01-05 18:53

    You don't need selenium. The instructions to apply the content which gives the pseudo before elements their values is carried in the css style instructions:

    Here, the 2/3 letter strings after the .icon- e.g. acb map to the span elements which house your before content. The values after \9d0 are + 1 of the actual value shown. You can create a dictionary from these pairs of values (with the adjustment) to decode the number at each before from the span class value.

    Example of how 2/3 letter strings map to content:

    My method is perhaps a little verbose as I am not that familiar with Python but the logic should be clear.

    import requests
    import re
    from bs4 import BeautifulSoup
    url = 'https://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?xid=QmFuZ2Fsb3JlIEJhbmsgRXhhbSBUdXRvcmlhbHM='
    res  = requests.get(url, headers  = {'User-Agent': 'Mozilla/5.0'})
    soup = BeautifulSoup(res.content, 'lxml')
    
    cipherKey = str(soup.select('style[type="text/css"]')[1])
    keys = re.findall('-(\w+):before', cipherKey, flags=0)
    values = [int(item)-1 for item in re.findall('9d0(\d+)', cipherKey, flags=0)]
    cipherDict = dict(zip(keys,values))
    cipherDict[list(cipherDict.keys())[list(cipherDict.values()).index(10)]] = '+'
    decodeElements = [item['class'][1].replace('icon-','') for item in soup.select('.telCntct span[class*="icon"]')]
    
    telephoneNumber = ''.join([str(cipherDict.get(i)) for i in decodeElements])
    print(telephoneNumber)
    

提交回复
热议问题