I am trying to scrape phone number from this website using selenium. I found the class to be \"tel ttel\" but when I try to scrape the website by find_element_by_xpath. I ge
You can also get the :before content from the computed style:
chars = driver.execute_script("return [...document.querySelectorAll('.telCntct a.tel span')].map(span => window.getComputedStyle(span,':before').content)")
But in this case you're left with weird unicode content that you then have to map to numbers.
You don't need selenium. The instructions to apply the content which gives the pseudo before elements their values is carried in the css style instructions:
Here, the 2/3 letter strings after the .icon- e.g. acb map to the span elements which house your before content. The values after \9d0 are + 1 of the actual value shown. You can create a dictionary from these pairs of values (with the adjustment) to decode the number at each before from the span class value.
Example of how 2/3 letter strings map to content:
My method is perhaps a little verbose as I am not that familiar with Python but the logic should be clear.
import requests
import re
from bs4 import BeautifulSoup
url = 'https://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?xid=QmFuZ2Fsb3JlIEJhbmsgRXhhbSBUdXRvcmlhbHM='
res = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(res.content, 'lxml')
cipherKey = str(soup.select('style[type="text/css"]')[1])
keys = re.findall('-(\w+):before', cipherKey, flags=0)
values = [int(item)-1 for item in re.findall('9d0(\d+)', cipherKey, flags=0)]
cipherDict = dict(zip(keys,values))
cipherDict[list(cipherDict.keys())[list(cipherDict.values()).index(10)]] = '+'
decodeElements = [item['class'][1].replace('icon-','') for item in soup.select('.telCntct span[class*="icon"]')]
telephoneNumber = ''.join([str(cipherDict.get(i)) for i in decodeElements])
print(telephoneNumber)