I am trying to scrape phone number from this website using selenium. I found the class to be \"tel ttel\" but when I try to scrape the website by find_element_by_xpath. I ge
You can also get the :before
content from the computed style:
chars = driver.execute_script("return [...document.querySelectorAll('.telCntct a.tel span')].map(span => window.getComputedStyle(span,':before').content)")
But in this case you're left with weird unicode content that you then have to map to numbers.
You don't need selenium. The instructions to apply the content which gives the pseudo before elements their values is carried in the css style instructions:
Here, the 2/3 letter strings after the .icon-
e.g. acb
map to the span
elements which house your before
content. The values after \9d0
are + 1 of the actual value shown. You can create a dictionary from these pairs of values (with the adjustment) to decode the number at each before
from the span
class value.
Example of how 2/3 letter strings map to content:
My method is perhaps a little verbose as I am not that familiar with Python but the logic should be clear.
import requests
import re
from bs4 import BeautifulSoup
url = 'https://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?xid=QmFuZ2Fsb3JlIEJhbmsgRXhhbSBUdXRvcmlhbHM='
res = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(res.content, 'lxml')
cipherKey = str(soup.select('style[type="text/css"]')[1])
keys = re.findall('-(\w+):before', cipherKey, flags=0)
values = [int(item)-1 for item in re.findall('9d0(\d+)', cipherKey, flags=0)]
cipherDict = dict(zip(keys,values))
cipherDict[list(cipherDict.keys())[list(cipherDict.values()).index(10)]] = '+'
decodeElements = [item['class'][1].replace('icon-','') for item in soup.select('.telCntct span[class*="icon"]')]
telephoneNumber = ''.join([str(cipherDict.get(i)) for i in decodeElements])
print(telephoneNumber)