I\'m writing a python script which will extract the script locations after parsing from a webpage. Lets say there are two scenarios :
This should work, you just filter to find all the script tags, then determine if they have a 'src' attribute. If they do then the URL to the javascript is contained in the src attribute, otherwise we assume the javascript is in the tag
#!/usr/bin/python
import requests
from bs4 import BeautifulSoup
# Test HTML which has both cases
html = ' '
soup = BeautifulSoup(html)
# Find all script tags
for n in soup.find_all('script'):
# Check if the src attribute exists, and if it does grab the source URL
if 'src' in n.attrs:
javascript = n['src']
# Otherwise assume that the javascript is contained within the tags
else:
javascript = n.text
print javascript
This output of this is
http://example.com/something.js
some JS