Could you please help me with this lil thing. I am looking to extract email, phone and name value from the below code in SCRIPT tag(not in Body) using Beautiful soup(Python)
You can get the script tag contents via BeautifulSoup and then apply a regex to get the desired data.
Working example (based on what you've described in the question):
import re
from bs4 import BeautifulSoup
data = """
My Sample Page
What a wonderful world
"""
soup = BeautifulSoup(data)
script = soup.find('script')
pattern = re.compile("(\w+): '(.*?)'")
fields = dict(re.findall(pattern, script.text))
print fields['email'], fields['phone'], fields['name']
Prints:
abc@g.com 9999999999 XYZ
I don't really like the solution, since that regex approach is really fragile. All sorts of things can happen that would break it. I still think there is a better solution and we are missing a bigger picture here. Providing a link to that specific site would help a lot, but it is what it is.
UPD (fixing the code OP provided):
soup = BeautifulSoup(data, 'html.parser')
script = soup.html.find_next_sibling('script', text=re.compile(r"\$\(document\)\.ready"))
pattern = re.compile("(\w+): '(.*?)'")
fields = dict(re.findall(pattern, script.text))
print fields['email'], fields['phone'], fields['name']
prints:
abcd@gmail.com 9999999999 Shamita Shetty