beautifulsoup

Parsing a website with BeautifulSoup and Selenium

匆匆过客 提交于 2020-03-26 03:37:20
问题 Trying to compare avg. temperatures to actual temperatures by scraping them from: https://usclimatedata.com/climate/binghamton/new-york/united-states/usny0124 I can successfully gather the webpage's source code, but I am having trouble parsing through it to only give the values for the high temps, low temps, rainfall and the averages under the "History" tab, but I can't seem to address the right class/id without getting the only result as "None". This is what I have so far, with the last line

Web Scraping Dynamic Pages - Adjusting the code

有些话、适合烂在心里 提交于 2020-03-25 18:46:08
问题 αԋɱҽԃ αмєяιcαη helped me in constructing this code for scraping reviews from this page where reviews are dynamically loaded. I then tried to adjust it so that it scrapes not just the comment-body, but also the commentors' names, dates, and ratings, and for the code to save the extracted data into an excel file. But I failed to do so. Could someone help me in adjusting the code correctly? This is the code from αԋɱҽԃ αмєяιcαη import requests from bs4 import BeautifulSoup import math def PageNum

Beautiful Soup loop over div element in HTML

江枫思渺然 提交于 2020-03-25 16:00:04
问题 I am attempting to use Beautiful Soup to extract some values out of a web page (not very much wisdom here..) which are hourly values from a weatherbug forecast. In Chrome developer mode I can see the values are nested within the div classes as shown in the snip below: In Python I can attempt to mimic a web browser and find these values: import requests import bs4 as BeautifulSoup import pandas as pd from bs4 import BeautifulSoup url = 'https://www.weatherbug.com/weather-forecast/hourly/san

Beautiful Soup loop over div element in HTML

房东的猫 提交于 2020-03-25 15:59:13
问题 I am attempting to use Beautiful Soup to extract some values out of a web page (not very much wisdom here..) which are hourly values from a weatherbug forecast. In Chrome developer mode I can see the values are nested within the div classes as shown in the snip below: In Python I can attempt to mimic a web browser and find these values: import requests import bs4 as BeautifulSoup import pandas as pd from bs4 import BeautifulSoup url = 'https://www.weatherbug.com/weather-forecast/hourly/san

Can't parse a Google search result page using BeautifulSoup

点点圈 提交于 2020-03-23 08:03:52
问题 I'm parsing webpages using BeautifulSoup from bs4 in python. When I inspected the elements of a google search page, this was the division having the 1st result: Image and since it had class = 'r' I wrote this code: import requests site = requests.get('https://www.google.com/search?client=firefox-b-d&ei=CLtgXt_qO7LH4-EP6LSzuAw&q=%22narendra+modi%22+%\22scams%22+%\22frauds%22+%\22corruption%22+%22modi%22+-lalit+-nirav&oq=%22narendra+modi%22+%\22scams%22+%\22frauds%22+%\22corruption%22+%22modi

How can I scrape the title of different jobs from a website using requests?

与世无争的帅哥 提交于 2020-03-22 04:54:47
问题 I'm trying to create a script in python using requests module to scrape the title of different jobs from a website. To parse the title of different jobs I need to get the relevant response from that site first so that I can process the content using BeautifulSoup. However, When I run the following script, I can see that the script produces gibberish which literally do not contain the titles I look for. website link ( In case you don't see any data, make sure to refresh the page ) I've tried

Write cleaned BS4 data to csv file

陌路散爱 提交于 2020-03-21 11:04:12
问题 from selenium import webdriver from bs4 import BeautifulSoup import csv chrome_path = r"C:\Users\chromedriver_win32\chromedriver.exe" driver = webdriver.Chrome(chrome_path) driver.get('http://www.yell.com') search = driver.find_element_by_id("search_keyword") search.send_keys("plumbers") place = driver.find_element_by_id("search_location") place.send_keys("London") driver.find_element_by_xpath("""//*[@id="searchBoxForm"]/fieldset/div[1]/div[3]/button""").click() soup = BeautifulSoup(driver

Write cleaned BS4 data to csv file

喜欢而已 提交于 2020-03-21 11:03:18
问题 from selenium import webdriver from bs4 import BeautifulSoup import csv chrome_path = r"C:\Users\chromedriver_win32\chromedriver.exe" driver = webdriver.Chrome(chrome_path) driver.get('http://www.yell.com') search = driver.find_element_by_id("search_keyword") search.send_keys("plumbers") place = driver.find_element_by_id("search_location") place.send_keys("London") driver.find_element_by_xpath("""//*[@id="searchBoxForm"]/fieldset/div[1]/div[3]/button""").click() soup = BeautifulSoup(driver

How to loop through a list of urls for web scraping with BeautifulSoup

人走茶凉 提交于 2020-03-21 07:18:27
问题 Does anyone know how to scrape a list of urls from the same website by Beautifulsoup? list = ['url1', 'url2', 'url3'...] ========================================================================== My code to extract a list of urls: url = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=2' url1 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=3' url2 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=4' r = requests.get(url) r1 =

How to loop through a list of urls for web scraping with BeautifulSoup

此生再无相见时 提交于 2020-03-21 07:18:10
问题 Does anyone know how to scrape a list of urls from the same website by Beautifulsoup? list = ['url1', 'url2', 'url3'...] ========================================================================== My code to extract a list of urls: url = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=2' url1 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=3' url2 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=4' r = requests.get(url) r1 =