beautifulsoup

Python scrape website w/BeautifulSoup4 shwoing attribute error for table with class name

三世轮回 提交于 2021-01-28 17:56:10
问题 I am following this tutorial: https://www.pluralsight.com/guides/extracting-data-html-beautifulsoup To download the table on this page: http://www.knapsackfamily.com/LunchBox/top.php#res Edit: That table appears after I click the button "List All" which is an input in a form with action=top.php#res . I inspected the table: and it shows the table classes are either sortable dl or sortable d1 so I tried them both in my script: """ get knapsack food table and table at link "more" follow: https:/

Python BeautifulSoup Paragraph Text only

时光总嘲笑我的痴心妄想 提交于 2021-01-28 14:50:31
问题 I am very new to anything webscraping related and as I understand Requests and BeautifulSoup are the way to go in that. I want to write a program which emails me only one paragraph of a given link every couple of hours (trying a new way to read blogs through the day) Say this particular link 'https://fs.blog/mental-models/' has a a paragraph each on different models. from bs4 import BeautifulSoup import re import requests url = 'https://fs.blog/mental-models/' r = requests.get(url) soup =

AttributeError: 'ResultSet' object has no attribute 'find_all' Beautifulsoup

那年仲夏 提交于 2021-01-28 14:29:56
问题 I dont understand why do i get this error: I have a fairly simple function: def scrape_a(url): r = requests.get(url) soup = BeautifulSoup(r.content) news = soup.find_all("div", attrs={"class": "news"}) for links in news: link = news.find_all("href") return link Here is th estructure of webpage I am trying to scrape: <div class="news"> <a href="www.link.com"> <h2 class="heading"> heading </h2> <div class="teaserImg"> <img alt="" border="0" height="124" src="/image"> </div> <p> text </p> </a> <

(Beautiful Soup) Get data inside a button tag

早过忘川 提交于 2021-01-28 14:11:45
问题 I try to scrape out an ImageId inside a button tag, want to have the result: "25511e1fd64e99acd991a22d6c2d6b6c". When I try: drawing_url = drawing_url.find_all('button', class_='inspectBut')['onclick'] it doesn't work. Giving an error- TypeError: list indices must be integers or slices, not str Input = for article in soup.find_all('div', class_='dojoxGridRow'): drawing_url = article.find('td', class_='dojoxGridCell', idx='3') drawing_url = drawing_url.find_all('button', class_='inspectBut')

Web scraping with Python and beautifulsoup: What is saved by the BeautifulSoup function?

时光毁灭记忆、已成空白 提交于 2021-01-28 14:11:37
问题 This question follows this previous question. I want to scrap data from a betting site using Python. I first tried to follow this tutorial, but the problem is that the site tipico is not available from Switzerland. I thus chose another betting site: Winamax. In the tutorial, the webpage tipico is first inspected, in order to find where the betting rates are located in the html file . In the tipico webpage, they were stored in buttons of class “c_but_base c_but". By writing the following lines

How to web scrape a chart by using Python?

爱⌒轻易说出口 提交于 2021-01-28 13:42:48
问题 I am trying to web scrape, by using Python 3, a chart off of this website into a .csv file: 2016 NBA National TV Schedule The chart starts out like: Tuesday, October 25 8:00 PM Knicks/Cavaliers TNT 10:30 PM Spurs/Warriors TNT Wednesday, October 26 8:00 PM Thunder/Sixers ESPN 10:30 PM Rockets/Lakers ESPN I am using these packages: from bs4 import BeautifulSoup import requests import pandas as pd import numpy as np The output I want in a .csv file looks like this: These are the first six lines

Get response 200 instead of <418 I'm a Teapot>, using DDG

為{幸葍}努か 提交于 2021-01-28 13:33:56
问题 I was trying to scrape search results from DDG the other day, but i keep getting response 418. How can i make it response 200 or get results from it? This is my code. import requests from bs4 import BeautifulSoup import urllib while True: query = input("Enter Search Text: ") a = query.replace(' ', '+') url = 'https://duckduckgo.com/?q=random' +a headers = {"User-Agent": "Mozilla/5.0 (Linux; Android 6.0.1; SHIELD Tablet K1 Build/MRA58K; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0

Get response 200 instead of <418 I'm a Teapot>, using DDG

那年仲夏 提交于 2021-01-28 13:32:15
问题 I was trying to scrape search results from DDG the other day, but i keep getting response 418. How can i make it response 200 or get results from it? This is my code. import requests from bs4 import BeautifulSoup import urllib while True: query = input("Enter Search Text: ") a = query.replace(' ', '+') url = 'https://duckduckgo.com/?q=random' +a headers = {"User-Agent": "Mozilla/5.0 (Linux; Android 6.0.1; SHIELD Tablet K1 Build/MRA58K; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0

Selenium scraping JS loaded pages

荒凉一梦 提交于 2021-01-28 12:18:18
问题 I'm trying to scrape some of the loaded JS data from https://surviv.io/stats/player787, such as the number of total kills. Could someone tell me how I can scrape the js loaded data with selenium. Thanks. EDIT: Here is some of the code from selenium import webdriver browser = webdriver.Firefox() browser.get('https://surviv.io/stats/player787') b = browser.find_element_by_tag_name('tr') The 'tr' which contains the data that I want is not grabbed by selenium 回答1: To get the count of kills.Induce

Python - How can I scrape with bs4 a javascript code)?

会有一股神秘感。 提交于 2021-01-28 12:12:13
问题 So I have been trying to scrape out a value from a html that is a javascript. There is alot of javascript in the code but I just want to be able to print out this one: var spConfig=newProduct.Config({ "attributes": { "531": { "id": "531", "options": [ { "id": "18", "hunter": "0", "products": [ "128709" ] }, { "label": "40 1\/2", "hunter": "0", "products": [ "120151" ] }, { "id": "33", "hunter": "0", "products": [ "120152" ] }, { "id": "36", "hunter": "0", "products": [ "128710" ] }, { "id":