beautifulsoup | 易学教程

Web Scraping Extract Javascript Table Selenium+Python

阅读更多关于 Web Scraping Extract Javascript Table Selenium+Python

问题 I've read several articles of Web Scraping with but I didn't undestand how to find the elements in the site. The site I want to scrap the table is below: http://www.bmfbovespa.com.br/pt_br/servicos/market-data/cotacoes/mercado-de-derivativos/?symbol=DI1 I want to scrap the tables: "TB01, "TB02, TB03 and TB04" theses are the ids of the tables <tbody> == $0 <tr> <td id="TB01">...</td> <td id="TB02">...</td> <td id="TB03">...</td> <td id="TB04">...</td> <tr> I've tried all the find.element

Scrap data from bloomberg

阅读更多关于 Scrap data from bloomberg

问题 I want to scrap data from this website. The data under "IBVC:IND Caracas Stock Exchange Stock Market Index" needs to be scrapped. I am using beautiful soup and request. used beautiful soup and requests import requests from bs4 import BeautifulSoup as bs headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/58.0.3029.110 Safari/537.36 ' } res = requests.get("https://www.bloomberg.com/quote/IBVC:IND", headers=headers) soup = bs(res

BeautifulSoup returns None even though the element exists

阅读更多关于 BeautifulSoup returns None even though the element exists

问题 I have gone through most of the solutions for similar issues but haven't found one that works and more importantly haven't found an explanation of why this occurs outside of when Javascript or something else is being called on the site being scraped. I am trying to scrape the table for game "Officials" from the site: http://www.pro-football-reference.com/boxscores/201309050den.htm my code is: url = "http://www.pro-football-reference.com/boxscores/201309050den.htm" html = urlopen(url) bsObj =

Using Python to use a website's search function

阅读更多关于 Using Python to use a website's search function

问题 I am trying to use a search function of a website with this code structure: <div class='search'> <div class='inner'> <form accept-charset="UTF-8" action="/gr/el/products" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="✓" /></div> <label for='query'>Ενδιαφέρομαι για...</label> <fieldset> <input class="search-input" data-search-url="/gr/el/products/autocomplete.json" id="text_search" name="query" placeholder="Αναζητήστε προϊόν" type="text" />

Python to get onclick values

阅读更多关于 Python to get onclick values

问题 I'm using Python and BeautifulSoup to scrape a web page for a small project of mine. The webpage has multiple entries, each separated by a table row in HTML. My code partially works however a lot of the output is blank and it won't fetch all of the results from the web page or even gather them into the same line. <html> <head> <title>Sample Website</title> </head> <body> <table> <td class=channel>Artist</td><td class=channel>Title</td><td class=channel>Date</td><td class=channel>Time</td></tr

login to page with Selenium works - parsing with BS4 works - but not the combination of both

阅读更多关于 login to page with Selenium works - parsing with BS4 works - but not the combination of both

问题 getting some data from Wordpress-forums requires login and parsing - two parts. Both work very well as a standalone part. i can login with selenium - perfectly - and i can parse (scrape) the data with BS4. But when i combine the two parts then i run into session issues - that i cannot solve. from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time #--| Setup options = Options(

login to page with Selenium works - parsing with BS4 works - but not the combination of both

阅读更多关于 login to page with Selenium works - parsing with BS4 works - but not the combination of both

login to page with Selenium works - parsing with BS4 works - but not the combination of both

阅读更多关于 login to page with Selenium works - parsing with BS4 works - but not the combination of both

How to save pictures from a website to a local folder

阅读更多关于 How to save pictures from a website to a local folder

问题 I'd need to save pictures from this website in a folder: http://www.photobirdireland.com/garden-birds.html I've tried by using import os from lxml import html from urllib.request import urlopen from bs4 import BeautifulSoup as bs class ImageScraper: def __init__(self, url, download_path): self.url = url self.download_path = download_path self.session = requests.Session() def scrape_images(self): html = urlopen(url) bs4 = bs(html, 'html.parser') images = bs4.find_all('img', {}) scraper =

Find Specific Text Within HTML Tag in Python

阅读更多关于 Find Specific Text Within HTML Tag in Python

问题 I've tried a million different ways to parse out the zestimate, but have yet to be successful. here's the html tag with the zestimate info: <span> <span tabindex="0" role="button"> <span class="sc-bGbJRg iiEDXU ds-dashed-underline"> Zestimate <sup>®</sup> </span> </span> : <span>$331,425</span> </span> Honestly I thought this would get me close, but I get an empty list: link = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/' searched_word = '<span