How to extract var (values) from <script> of html using beautifulsoup

社会主义新天地 提交于 2020-12-15 05:52:31

问题


i am currently using

import requests
from bs4 import BeautifulSoup

source = requests.get('www.randomwebsite.com').text
soup = BeautifulSoup(source,'lxml')
details= soup.find('script')

this is returning me the following script.

     <script>
var Url = "https://www.example.com";
                if(Url != ''){code}
 else {code
}
  </script>

i want to have the output as following. https://www.example.com


回答1:


import re

text = """
     <script>
var Url = "https://www.example.com";
                if(Url != ''){code}
 else {code
}
  </script>
"""


match = re.search('Url = "(.*?)"', text)

print(match.group(1))

Output:

https://www.example.com



回答2:


To print the cashback_url, you can try this script:

import re
import requests


url = 'https://tracking.earnkaro.com/visitretailer/508?id=103894&shareid=ENKR2020090345700421&dl=https%3A%2F%2Fwww.amazon.in%2Fgp%2Fproduct%2FB08645RXJ6%2Fref%3Dox_sc_act_title_1%3Fsmid%3DAT95IG9ONZD7S%26psc%3D1'
html_data = requests.get(url).text

cashback_url = re.search(r'var cashbackUrl = "(.*?)"', html_data).group(1)

print(cashback_url)

Prints:

https://www.amazon.in/gp/product/B08645RXJ6/ref=ox_sc_act_title_1?smid=AT95IG9ONZD7S&psc=1&ck&tag=EK003221-21


来源:https://stackoverflow.com/questions/63753039/how-to-extract-var-values-from-script-of-html-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!