beautifulsoup

Web Scraping Extract Javascript Table Selenium+Python

╄→гoц情女王★ 提交于 2020-07-07 13:03:59
问题 I've read several articles of Web Scraping with but I didn't undestand how to find the elements in the site. The site I want to scrap the table is below: http://www.bmfbovespa.com.br/pt_br/servicos/market-data/cotacoes/mercado-de-derivativos/?symbol=DI1 I want to scrap the tables: "TB01, "TB02, TB03 and TB04" theses are the ids of the tables <tbody> == $0 <tr> <td id="TB01">...</td> <td id="TB02">...</td> <td id="TB03">...</td> <td id="TB04">...</td> <tr> I've tried all the find.element

Scrap data from bloomberg

风格不统一 提交于 2020-07-06 20:00:05
问题 I want to scrap data from this website. The data under "IBVC:IND Caracas Stock Exchange Stock Market Index" needs to be scrapped. I am using beautiful soup and request. used beautiful soup and requests import requests from bs4 import BeautifulSoup as bs headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/58.0.3029.110 Safari/537.36 ' } res = requests.get("https://www.bloomberg.com/quote/IBVC:IND", headers=headers) soup = bs(res

BeautifulSoup returns None even though the element exists

喜欢而已 提交于 2020-07-06 18:47:05
问题 I have gone through most of the solutions for similar issues but haven't found one that works and more importantly haven't found an explanation of why this occurs outside of when Javascript or something else is being called on the site being scraped. I am trying to scrape the table for game "Officials" from the site: http://www.pro-football-reference.com/boxscores/201309050den.htm my code is: url = "http://www.pro-football-reference.com/boxscores/201309050den.htm" html = urlopen(url) bsObj =

Using Python to use a website's search function

柔情痞子 提交于 2020-07-05 18:17:28
问题 I am trying to use a search function of a website with this code structure: <div class='search'> <div class='inner'> <form accept-charset="UTF-8" action="/gr/el/products" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="✓" /></div> <label for='query'>Ενδιαφέρομαι για...</label> <fieldset> <input class="search-input" data-search-url="/gr/el/products/autocomplete.json" id="text_search" name="query" placeholder="Αναζητήστε προϊόν" type="text" />

Python to get onclick values

痴心易碎 提交于 2020-07-04 20:23:29
问题 I'm using Python and BeautifulSoup to scrape a web page for a small project of mine. The webpage has multiple entries, each separated by a table row in HTML. My code partially works however a lot of the output is blank and it won't fetch all of the results from the web page or even gather them into the same line. <html> <head> <title>Sample Website</title> </head> <body> <table> <td class=channel>Artist</td><td class=channel>Title</td><td class=channel>Date</td><td class=channel>Time</td></tr

login to page with Selenium works - parsing with BS4 works - but not the combination of both

橙三吉。 提交于 2020-07-03 13:01:47
问题 getting some data from Wordpress-forums requires login and parsing - two parts. Both work very well as a standalone part. i can login with selenium - perfectly - and i can parse (scrape) the data with BS4. But when i combine the two parts then i run into session issues - that i cannot solve. from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time #--| Setup options = Options(

login to page with Selenium works - parsing with BS4 works - but not the combination of both

两盒软妹~` 提交于 2020-07-03 13:01:01
问题 getting some data from Wordpress-forums requires login and parsing - two parts. Both work very well as a standalone part. i can login with selenium - perfectly - and i can parse (scrape) the data with BS4. But when i combine the two parts then i run into session issues - that i cannot solve. from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time #--| Setup options = Options(

login to page with Selenium works - parsing with BS4 works - but not the combination of both

三世轮回 提交于 2020-07-03 13:00:08
问题 getting some data from Wordpress-forums requires login and parsing - two parts. Both work very well as a standalone part. i can login with selenium - perfectly - and i can parse (scrape) the data with BS4. But when i combine the two parts then i run into session issues - that i cannot solve. from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time #--| Setup options = Options(

How to save pictures from a website to a local folder

霸气de小男生 提交于 2020-07-03 12:37:42
问题 I'd need to save pictures from this website in a folder: http://www.photobirdireland.com/garden-birds.html I've tried by using import os from lxml import html from urllib.request import urlopen from bs4 import BeautifulSoup as bs class ImageScraper: def __init__(self, url, download_path): self.url = url self.download_path = download_path self.session = requests.Session() def scrape_images(self): html = urlopen(url) bs4 = bs(html, 'html.parser') images = bs4.find_all('img', {}) scraper =

Find Specific Text Within HTML Tag in Python

ε祈祈猫儿з 提交于 2020-06-28 09:21:14
问题 I've tried a million different ways to parse out the zestimate, but have yet to be successful. here's the html tag with the zestimate info: <span> <span tabindex="0" role="button"> <span class="sc-bGbJRg iiEDXU ds-dashed-underline"> Zestimate <sup>®</sup> </span> </span> :  <span>$331,425</span> </span> Honestly I thought this would get me close, but I get an empty list: link = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/' searched_word = '<span