beautifulsoup | 易学教程

How to scrape all Steam id, review content, profile_url from reviews of a game in steam into excel file using python?

阅读更多关于 How to scrape all Steam id, review content, profile_url from reviews of a game in steam into excel file using python?

问题 #The error is either it prints only first 11 reviews (when while n<500 is used) or does not print at all(when while True: is used). Requirement is to save all Steam id, review content, profile_url from reviews of the game into excel. from msedge.selenium_tools import Edge, EdgeOptions from selenium.webdriver.common.keys import Keys import re from time import sleep from datetime import datetime from openpyxl import Workbook game_id= 1097150 url = 'https://steamcommunity.com/app/1097150

Why is this Requests and BeautifulSoup login script not working?

阅读更多关于 Why is this Requests and BeautifulSoup login script not working?

问题 The following is a bit of code i produced on the back of the Requests and BeautifulSoup libraries for Python 3. import requests as rq from bs4 import BeautifulSoup as bs def get_data(): return {'email': str(input('Enter your email.')), 'password': str(input('Enter your password.'))} def obtain_data(): login_data=get_data() form_data={'csrf_token': login_data['email'], 'login': '1', 'redirect': 'account/dashboard', 'query': None, 'required': 'email,password', 'email': login_data['email'],

Python Webscraping beautifulsoup avoid repetition in find_all()

阅读更多关于 Python Webscraping beautifulsoup avoid repetition in find_all()

问题 I am working on web scraping in Python using beautifulsoup. I am trying to extract text in bold or italics or both. Consider the following HTML snippet. <div> <b> <i> HelloWorld </i> </b> </div> If I use the command sp.find_all(['i', 'b']) , understandably, I get two results, one corresponding to bold and the other to italics. i.e. ['< b>< i>HelloWorld< /i>< /b>', '< i>HelloWorld< /i>'] My question is, is there a way to uniquely extract it and get the tags?. My desired output is something

getting table value from nowgoal has got an index error

阅读更多关于 getting table value from nowgoal has got an index error

问题 I am quite new to scraping. I am getting links from nowgoal below is how I started navigating to above page. I do not wish to get link for all matches. but I will have an input txt file, which is attached Here and use the selected league and date. The following code will initialize as input: #Intialisation league_index =[] final_list = [] j = 0 #config load config = RawConfigParser() configFilePath = r'.\config.txt' config.read(configFilePath) date = config.get('database_config','date')

How to scrape the dynamic table data

阅读更多关于 How to scrape the dynamic table data

问题 I want to scrape the table data from http://5000best.com/websites/ The content of the table is paginated upto several pages and are dynamic. I want to scrape the table data for each category. I can scrape the table manually for each category but this is not what I want. Please look at it and give me the approach to do it. I am able to make links for each category i.e. http://5000best.com/websites/Movies/, http://5000best.com/websites/Games/ etc. But I am not sure how to make it further to

extract the number of results from google search

阅读更多关于 extract the number of results from google search

问题 I am writing a web scraper to extract the number of results of searching in a google search which appears on the top left of the page of search results. I have written the code below but I do not understand why phrase_extract is None. I want to extract the phrase "About 12,010,000,000 results". which part I am making a mistake? may be parsing the HTML incorrectly? import requests from bs4 import BeautifulSoup def pyGoogleSearch(word): address='http://www.google.com/#q=' newword=address+word

Beautiful soup multiple Span Extract Table

阅读更多关于 Beautiful soup multiple Span Extract Table

问题 I am currently working on my class assignment. I have to extract the data from the SPECS table from this webpage. https://www.consumerreports.org/products/drip-coffee-maker/behmor-connected-alexa-enabled-temperature-control-396982/overview/ The data I need is stored as <h2 class="crux-product-title">Specs</h2> </div> </div> <div class="row"> <div class="col-xs-12"> <div class="product-model-features-specs-item"> <div class="row"> <div class='col-lg-6 col-md-6 col-sm-6 col-xs-12 product-model

Beautiful Soup returns 'none'

阅读更多关于 Beautiful Soup returns 'none'

问题 I am using the following code to extract data using beautiful soup: import requests import bs4 res = requests.get('https://www.jmu.edu/cgi-bin/parking_sign_data.cgi?hash=53616c7465645f5f5c0bbd0eccccb6fe8dd7ed9a0445247e3c7dcb4f91927f7ccc933be780c6e558afb8ebf73620c3e5e3b2c68cd3c138519068eac99d9bf30e1e67ce894deb3a054f95f882da2ea2f0|869835tg89dhkdnbnsv5sg5wg0vmcf4mfcfc2qwm5968unmeh5') soup = bs4.BeautifulSoup(res.text, 'xml') soup.find_all("span", class_="text") I've tried different variations of

Missing values while scraping using beautifulsoup in python

阅读更多关于 Missing values while scraping using beautifulsoup in python

问题 I'm trying to do web scraping as my first project using python (completely new to programming), I'm almost done, however some values on the web page are missing, so I want to replace that missing value with something like a "0" or "Not found", really I just want to make a csv file out of the data, not really going forward with the analysis. The web page I'm scraping is: https://www.lamudi.com.mx/nuevo-leon/departamento/for-rent/?page=1 I have a loop that collects all of te links of the page,

Beautiful Soup find() returns None?

阅读更多关于 Beautiful Soup find() returns None?

问题 I am trying to parse the HTML on this website. I would like to get the text from all these span elements with class = "post-subject" Examples: <span class="post-subject">Set of 20 moving boxes (20009 or 20011)</span> <span class="post-subject">Firestick/Old xbox games</span> When I run my code below, soup.find() returns None . I'm not sure what's going on? import requests from bs4 import BeautifulSoup page = requests.get('https://trashnothing.com/washington-dc-freecycle?page=1') soup =