beautifulsoup

Use BeautifulSoup to obtain “View Element” code instead of “View Source” code

本小妞迷上赌 提交于 2021-02-06 06:30:10
问题 I'm using the following code to obtain all <script>...</script> content from a webpage (see url in code): import urllib2 from bs4 import BeautifulSoup import re import imp url = "http://racing4everyone.eu/2015/10/25/formula-e-201516formula-e-201516-round01-china-race/" page = urllib2.urlopen(url) soup = BeautifulSoup(page.read()) script = soup.find_all("script") print script #just to check the output of script However, BeautifulSoup searches within the source code (Ctrl+U in chrome) of the

Use BeautifulSoup to obtain “View Element” code instead of “View Source” code

无人久伴 提交于 2021-02-06 06:29:29
问题 I'm using the following code to obtain all <script>...</script> content from a webpage (see url in code): import urllib2 from bs4 import BeautifulSoup import re import imp url = "http://racing4everyone.eu/2015/10/25/formula-e-201516formula-e-201516-round01-china-race/" page = urllib2.urlopen(url) soup = BeautifulSoup(page.read()) script = soup.find_all("script") print script #just to check the output of script However, BeautifulSoup searches within the source code (Ctrl+U in chrome) of the

How to find spans with a specific class containing specific text using beautiful soup and re?

我的梦境 提交于 2021-02-05 18:50:13
问题 how can I find all span's with a class of 'blue' that contain text in the format: 04/18/13 7:29pm which could therefore be: 04/18/13 7:29pm or: Posted on 04/18/13 7:29pm in terms of constructing the logic to do this, this is what i have got so far: new_content = original_content.find_all('span', {'class' : 'blue'}) # using beautiful soup's find_all pattern = re.compile('<span class=\"blue\">[data in the format 04/18/13 7:29pm]</span>') # using re for _ in new_content: result = re.findall

WebScraping javascript page in python

穿精又带淫゛_ 提交于 2021-02-05 12:16:21
问题 Hello World, New in Python, I am trying to webscrape a javascript page : https://search.gleif.org/#/search/ Please find below the result from my code (using request) <!DOCTYPE html> <html> <head><meta charset="utf-8"/> <meta content="width=device-width,initial-scale=1" name="viewport"/> <title>LEI Search 2.0</title> <link href="/static/icons/favicon.ico" rel="shortcut icon" type="image/x-icon"/> <link href="https://fonts.googleapis.com/css?family=Open+Sans:200,300,400,600,700,900&subset

NoneType in python

ⅰ亾dé卋堺 提交于 2021-02-05 12:15:24
问题 I was trying to get some rating data from Tripadvisor but as i was trying to fetch the data i was getting 'NoneType' object is not subscriptable Can anybody help me figuring out where am i going wrong , sorry i am very new to python. Here is my sample code import requests import re from bs4 import BeautifulSoup r = requests.get('http://www.tripadvisor.in/Hotels-g186338-London_England-Hotels.html') data = r.text soup = BeautifulSoup(data) for rate in soup.find_all('div',{"class":"rating"}):

WebScraping javascript page in python

五迷三道 提交于 2021-02-05 12:13:02
问题 Hello World, New in Python, I am trying to webscrape a javascript page : https://search.gleif.org/#/search/ Please find below the result from my code (using request) <!DOCTYPE html> <html> <head><meta charset="utf-8"/> <meta content="width=device-width,initial-scale=1" name="viewport"/> <title>LEI Search 2.0</title> <link href="/static/icons/favicon.ico" rel="shortcut icon" type="image/x-icon"/> <link href="https://fonts.googleapis.com/css?family=Open+Sans:200,300,400,600,700,900&subset

Weird character not exists in html source python BeautifulSoup

夙愿已清 提交于 2021-02-05 09:26:10
问题 I have watched a video that teaches how to use BeautifulSoup and requests to scrape a website Here's the code from bs4 import BeautifulSoup as bs4 import requests import pandas as pd pages_to_scrape = 1 for i in range(1,pages_to_scrape+1): url = ('http://books.toscrape.com/catalogue/page-{}.html').format(i) pages.append(url) for item in pages: page = requests.get(item) soup = bs4(page.text, 'html.parser') #print(soup.prettify()) for j in soup.findAll('p', class_='price_color'): price=j

Beautiful Soup default decode charset?

旧时模样 提交于 2021-02-05 08:44:07
问题 I have a huge set of web pages with different encodings, and I try to parse it using Beautiful Soup. As I have noticed, BS detects encoding using meta-charset or xml-encoding tags. But there are documents with no such tags or typos in charset name - and BS fails on all of them. I suppose it's default guess is utf-8, which is wrong. Luckily, all such pages (or nearly all of them) have the same encoding. Is there any way to set it as default? I've also tried to grep charset and use iconv to

Getting javascript variable value while scraping with python

。_饼干妹妹 提交于 2021-02-05 08:19:07
问题 I know this is asked before also, but I am a newbie in scraping and python. Please help me and it would be very much helpful in my learning path. I am scraping a news site using python with packages such as Beautiful Soup and etc. I am facing difficulty while getting the value of java script variable which is declared in script tag and also it is getting updated there. Here is the part of HTML page which I am scraping:(containing only script part) <!-- Eliminate render-blocking JavaScript and

Getting javascript variable value while scraping with python

旧城冷巷雨未停 提交于 2021-02-05 08:18:52
问题 I know this is asked before also, but I am a newbie in scraping and python. Please help me and it would be very much helpful in my learning path. I am scraping a news site using python with packages such as Beautiful Soup and etc. I am facing difficulty while getting the value of java script variable which is declared in script tag and also it is getting updated there. Here is the part of HTML page which I am scraping:(containing only script part) <!-- Eliminate render-blocking JavaScript and