beautifulsoup

How to scrape all Steam id, review content, profile_url from reviews of a game in steam into excel file using python?

≡放荡痞女 提交于 2021-01-29 18:04:32
问题 #The error is either it prints only first 11 reviews (when while n<500 is used) or does not print at all(when while True: is used). Requirement is to save all Steam id, review content, profile_url from reviews of the game into excel. from msedge.selenium_tools import Edge, EdgeOptions from selenium.webdriver.common.keys import Keys import re from time import sleep from datetime import datetime from openpyxl import Workbook game_id= 1097150 url = 'https://steamcommunity.com/app/1097150

Why is this Requests and BeautifulSoup login script not working?

喜夏-厌秋 提交于 2021-01-29 15:52:00
问题 The following is a bit of code i produced on the back of the Requests and BeautifulSoup libraries for Python 3. import requests as rq from bs4 import BeautifulSoup as bs def get_data(): return {'email': str(input('Enter your email.')), 'password': str(input('Enter your password.'))} def obtain_data(): login_data=get_data() form_data={'csrf_token': login_data['email'], 'login': '1', 'redirect': 'account/dashboard', 'query': None, 'required': 'email,password', 'email': login_data['email'],

Python Webscraping beautifulsoup avoid repetition in find_all()

坚强是说给别人听的谎言 提交于 2021-01-29 15:51:48
问题 I am working on web scraping in Python using beautifulsoup. I am trying to extract text in bold or italics or both. Consider the following HTML snippet. <div> <b> <i> HelloWorld </i> </b> </div> If I use the command sp.find_all(['i', 'b']) , understandably, I get two results, one corresponding to bold and the other to italics. i.e. ['< b>< i>HelloWorld< /i>< /b>', '< i>HelloWorld< /i>'] My question is, is there a way to uniquely extract it and get the tags?. My desired output is something

getting table value from nowgoal has got an index error

廉价感情. 提交于 2021-01-29 15:37:20
问题 I am quite new to scraping. I am getting links from nowgoal below is how I started navigating to above page. I do not wish to get link for all matches. but I will have an input txt file, which is attached Here and use the selected league and date. The following code will initialize as input: #Intialisation league_index =[] final_list = [] j = 0 #config load config = RawConfigParser() configFilePath = r'.\config.txt' config.read(configFilePath) date = config.get('database_config','date')

How to scrape the dynamic table data

ぐ巨炮叔叔 提交于 2021-01-29 14:11:04
问题 I want to scrape the table data from http://5000best.com/websites/ The content of the table is paginated upto several pages and are dynamic. I want to scrape the table data for each category. I can scrape the table manually for each category but this is not what I want. Please look at it and give me the approach to do it. I am able to make links for each category i.e. http://5000best.com/websites/Movies/, http://5000best.com/websites/Games/ etc. But I am not sure how to make it further to

extract the number of results from google search

我的未来我决定 提交于 2021-01-29 13:56:48
问题 I am writing a web scraper to extract the number of results of searching in a google search which appears on the top left of the page of search results. I have written the code below but I do not understand why phrase_extract is None. I want to extract the phrase "About 12,010,000,000 results". which part I am making a mistake? may be parsing the HTML incorrectly? import requests from bs4 import BeautifulSoup def pyGoogleSearch(word): address='http://www.google.com/#q=' newword=address+word

Beautiful soup multiple Span Extract Table

别等时光非礼了梦想. 提交于 2021-01-29 12:31:03
问题 I am currently working on my class assignment. I have to extract the data from the SPECS table from this webpage. https://www.consumerreports.org/products/drip-coffee-maker/behmor-connected-alexa-enabled-temperature-control-396982/overview/ The data I need is stored as <h2 class="crux-product-title">Specs</h2> </div> </div> <div class="row"> <div class="col-xs-12"> <div class="product-model-features-specs-item"> <div class="row"> <div class='col-lg-6 col-md-6 col-sm-6 col-xs-12 product-model

Beautiful Soup returns 'none'

怎甘沉沦 提交于 2021-01-29 11:24:53
问题 I am using the following code to extract data using beautiful soup: import requests import bs4 res = requests.get('https://www.jmu.edu/cgi-bin/parking_sign_data.cgi?hash=53616c7465645f5f5c0bbd0eccccb6fe8dd7ed9a0445247e3c7dcb4f91927f7ccc933be780c6e558afb8ebf73620c3e5e3b2c68cd3c138519068eac99d9bf30e1e67ce894deb3a054f95f882da2ea2f0|869835tg89dhkdnbnsv5sg5wg0vmcf4mfcfc2qwm5968unmeh5') soup = bs4.BeautifulSoup(res.text, 'xml') soup.find_all("span", class_="text") I've tried different variations of

Missing values while scraping using beautifulsoup in python

假如想象 提交于 2021-01-29 11:18:23
问题 I'm trying to do web scraping as my first project using python (completely new to programming), I'm almost done, however some values on the web page are missing, so I want to replace that missing value with something like a "0" or "Not found", really I just want to make a csv file out of the data, not really going forward with the analysis. The web page I'm scraping is: https://www.lamudi.com.mx/nuevo-leon/departamento/for-rent/?page=1 I have a loop that collects all of te links of the page,

Beautiful Soup find() returns None?

我的未来我决定 提交于 2021-01-29 11:00:40
问题 I am trying to parse the HTML on this website. I would like to get the text from all these span elements with class = "post-subject" Examples: <span class="post-subject">Set of 20 moving boxes (20009 or 20011)</span> <span class="post-subject">Firestick/Old xbox games</span> When I run my code below, soup.find() returns None . I'm not sure what's going on? import requests from bs4 import BeautifulSoup page = requests.get('https://trashnothing.com/washington-dc-freecycle?page=1') soup =