问题
i am currently using
import requests
from bs4 import BeautifulSoup
source = requests.get('www.randomwebsite.com').text
soup = BeautifulSoup(source,'lxml')
details= soup.find('script')
this is returning me the following script.
<script>
var Url = "https://www.example.com";
if(Url != ''){code}
else {code
}
</script>
i want to have the output as following. https://www.example.com
回答1:
import re
text = """
<script>
var Url = "https://www.example.com";
if(Url != ''){code}
else {code
}
</script>
"""
match = re.search('Url = "(.*?)"', text)
print(match.group(1))
Output:
https://www.example.com
回答2:
To print the cashback_url, you can try this script:
import re
import requests
url = 'https://tracking.earnkaro.com/visitretailer/508?id=103894&shareid=ENKR2020090345700421&dl=https%3A%2F%2Fwww.amazon.in%2Fgp%2Fproduct%2FB08645RXJ6%2Fref%3Dox_sc_act_title_1%3Fsmid%3DAT95IG9ONZD7S%26psc%3D1'
html_data = requests.get(url).text
cashback_url = re.search(r'var cashbackUrl = "(.*?)"', html_data).group(1)
print(cashback_url)
Prints:
https://www.amazon.in/gp/product/B08645RXJ6/ref=ox_sc_act_title_1?smid=AT95IG9ONZD7S&psc=1&ck&tag=EK003221-21
来源:https://stackoverflow.com/questions/63753039/how-to-extract-var-values-from-script-of-html-using-beautifulsoup