Get Info From Script Tag (WebScrap) [duplicate]

烂漫一生 提交于 2020-01-16 08:56:09

问题


#Python Code
from bs4 import BeautifulSoup
import urllib3

url ='https://www. SomeData .com'
req = urllib3.PoolManager()
res = req.request('GET', url)
soup = BeautifulSoup(res.data, 'html.parser')
res = soup.find_all('script')
print(res)

Then I Got something like this:

Results below:
[
  <script>
        AAA.trackData.taxonomy = {
              a:"a",
              b:"b",
              c:"c2,
              ...} ;
</script>
</script>, <script async="" src="https://someData.com/js/detail.0a6eca28.js"></script>
]

How can i convert this to a json format to treat well data inside tag.


回答1:


Please check if this helps.

script = soup.find('script', text=re.compile('AAA\.trackData\.taxonomy'))
json_text = re.search(r'^\s*AAA\.trackData\.taxonomy\s*=\s*({.*?})\s*;\s*$',
                      script.string, flags=re.DOTALL | re.MULTILINE).group(1)
data = json.loads(json_text)


来源:https://stackoverflow.com/questions/57232015/get-info-from-script-tag-webscrap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!