beautifulsoup

How to extract var (values) from <script> of html using beautifulsoup

感情迁移 提交于 2020-12-15 05:52:30
问题 i am currently using import requests from bs4 import BeautifulSoup source = requests.get('www.randomwebsite.com').text soup = BeautifulSoup(source,'lxml') details= soup.find('script') this is returning me the following script. <script> var Url = "https://www.example.com"; if(Url != ''){code} else {code } </script> i want to have the output as following. https://www.example.com 回答1: import re text = """ <script> var Url = "https://www.example.com"; if(Url != ''){code} else {code } </script> ""

Beautifulsoup Python Youtube Scrape not working

空扰寡人 提交于 2020-12-13 04:04:11
问题 I'm trying to scrape Youtube URLs + Title from youtube accounts which are formatted like https://www.youtube.com/c/%s/videos %accountName . for example Apple The class given to the clickable text (title) in Youtube is ytd-grid-video-renderer #video-title.yt-simple-endpoint.ytd-grid-video-renderer - When clicking on the title object in inspector mode (Firefox) I am not getting any results, but the url ' url ' (somewhere in webCommandMetadata ) and title ' simpleText ' are showing in the

Scraping XML data with BS4 “lxml”

ⅰ亾dé卋堺 提交于 2020-12-13 03:43:53
问题 Trying to solve problem very similar to this one: [Scraping XML element attributes with beautifulsoup I have the following code: from bs4 import BeautifulSoup import requests r = requests.get('https://www.usda.gov/oce/commodity/wasde/latest.xml') data = r.text soup = BeautifulSoup(data, "lxml") for ce in soup.find_all("Cell"): print(ce["cell_value1"]) The code runs without error but does not print any values to the terminal. I want to extract the "cell_value1" data noted above for the whole

Scraping XML data with BS4 “lxml”

ε祈祈猫儿з 提交于 2020-12-13 03:43:21
问题 Trying to solve problem very similar to this one: [Scraping XML element attributes with beautifulsoup I have the following code: from bs4 import BeautifulSoup import requests r = requests.get('https://www.usda.gov/oce/commodity/wasde/latest.xml') data = r.text soup = BeautifulSoup(data, "lxml") for ce in soup.find_all("Cell"): print(ce["cell_value1"]) The code runs without error but does not print any values to the terminal. I want to extract the "cell_value1" data noted above for the whole

Scraping XML data with BS4 “lxml”

↘锁芯ラ 提交于 2020-12-13 03:41:54
问题 Trying to solve problem very similar to this one: [Scraping XML element attributes with beautifulsoup I have the following code: from bs4 import BeautifulSoup import requests r = requests.get('https://www.usda.gov/oce/commodity/wasde/latest.xml') data = r.text soup = BeautifulSoup(data, "lxml") for ce in soup.find_all("Cell"): print(ce["cell_value1"]) The code runs without error but does not print any values to the terminal. I want to extract the "cell_value1" data noted above for the whole

Scraping XML data with BS4 “lxml”

折月煮酒 提交于 2020-12-13 03:41:06
问题 Trying to solve problem very similar to this one: [Scraping XML element attributes with beautifulsoup I have the following code: from bs4 import BeautifulSoup import requests r = requests.get('https://www.usda.gov/oce/commodity/wasde/latest.xml') data = r.text soup = BeautifulSoup(data, "lxml") for ce in soup.find_all("Cell"): print(ce["cell_value1"]) The code runs without error but does not print any values to the terminal. I want to extract the "cell_value1" data noted above for the whole

AttributeError while scraping

≯℡__Kan透↙ 提交于 2020-12-13 03:35:23
问题 I am trying to scrape a website, I have got this error: AttributeError: 'NoneType' object has no attribute 'text' at ---> 12 for x in soup.select("div.site-content")] The code used is: rq = req.get("https://stopcensura.net/category/cronaca") soup = BeautifulSoup(rq.content, 'html.parser') scrape_info = [(x.h3.a.text, x.time.text) for x in soup.select("div.site-content")] I would like to get infnormation on title ( entry-title ), date ( class="date" ), the author ( <div class="by-author vcard

Convert data to DataFrame in python

三世轮回 提交于 2020-12-12 05:40:47
问题 With the help of @JaSON, here's a code that enables me to get the data in the table from local html and the code uses selenium from selenium import webdriver driver = webdriver.Chrome("C:/chromedriver.exe") driver.get('file:///C:/Users/Future/Desktop/local.html') counter = len(driver.find_elements_by_id("Section3")) xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]" print(counter) for

HTML Parsing using bs4

和自甴很熟 提交于 2020-12-11 19:21:02
问题 I am parsing an HTMl page and am having a hard time figuring out how to pull a certain 'p' tag without a class or on id. I am trying to reach the tag of 'p' with the lat and long. Here is my current code: import bs4 from urllib import urlopen as uReq #this opens the URL from bs4 import BeautifulSoup as soup #parses/cuts the html my_url = 'http://www.fortwiki.com/Battery_Adair' print(my_url) uClient = uReq(my_url) #opens the HTML and stores it in uClients page_html = uClient.read() # reads the

HTML Parsing using bs4

回眸只為那壹抹淺笑 提交于 2020-12-11 19:20:50
问题 I am parsing an HTMl page and am having a hard time figuring out how to pull a certain 'p' tag without a class or on id. I am trying to reach the tag of 'p' with the lat and long. Here is my current code: import bs4 from urllib import urlopen as uReq #this opens the URL from bs4 import BeautifulSoup as soup #parses/cuts the html my_url = 'http://www.fortwiki.com/Battery_Adair' print(my_url) uClient = uReq(my_url) #opens the HTML and stores it in uClients page_html = uClient.read() # reads the