beautifulsoup

Moving to next page for scraping using BeautifulSoup

为君一笑 提交于 2021-01-29 00:49:26
问题 I am unable to automate the following code to go to the next page and scrape data from Indeed.com. Please let me know how to handle this issue. import requests import bs4 from bs4 import BeautifulSoup import pandas as pd import time URL = "https://www.indeed.com/jobs?q=Amazon&l=" # Get the html info of the page page = requests.get(URL) soup = BeautifulSoup(page.text, "html.parser") # Get the job title def extract_job_title_from_result(soup): jobs = [] for div in soup.find_all(name="div",attrs

Moving to next page for scraping using BeautifulSoup

三世轮回 提交于 2021-01-29 00:40:49
问题 I am unable to automate the following code to go to the next page and scrape data from Indeed.com. Please let me know how to handle this issue. import requests import bs4 from bs4 import BeautifulSoup import pandas as pd import time URL = "https://www.indeed.com/jobs?q=Amazon&l=" # Get the html info of the page page = requests.get(URL) soup = BeautifulSoup(page.text, "html.parser") # Get the job title def extract_job_title_from_result(soup): jobs = [] for div in soup.find_all(name="div",attrs

Scrape website that require login with BeautifulSoup

走远了吗. 提交于 2021-01-28 21:53:41
问题 I want to scrape website that requires login with Python and BeautifulSoup and requests libs. (no selenium) This is my code: import requests from bs4 import BeautifulSoup auth = (username, password) headers = { 'authority': 'signon.springer.com', 'cache-control': 'max-age=0', 'upgrade-insecure-requests': '1', 'origin': 'https://signon.springer.com', 'content-type': 'application/x-www-form-urlencoded', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML,

How to find out the correct encoding when using beautifulsoup?

懵懂的女人 提交于 2021-01-28 20:15:48
问题 In python3 and beautifulsoup4 I want to get information from a website, after making the requests. I did so: import requests from bs4 import BeautifulSoup req = requests.get('https://sisgvarmazenamento.blob.core.windows.net/prd/PublicacaoPortal/Arquivos/201901.htm').text soup = BeautifulSoup(req,'lxml') soup.find("h1").text '\r\n CÃ\x82MARA MUNICIPAL DE SÃ\x83O PAULO' I do not know what the encoding is, but it's a site with Brazilian Portuguese, so it should be utf-8 or latin1 Please, is

Parsing an html table with pd.read_html where cells contain full-tables themselves

梦想的初衷 提交于 2021-01-28 20:07:22
问题 I need to parse a table from html that has other tables nested within the larger table. As called below with pd.read_html , each of these nested tables are parsed and then "inserted"/"concatenated" as rows. I'd like these nested tables to each be parsed into their own pd.DataFrames and the inserted as objects as the value of the corresponding column. If this is not possible, having raw html for the nested table as a string in the corresponding position would be fine. Code as tested: import

Need to create a dictionary of two span tags with in a wrapped up in a container. Using beautiful soups

∥☆過路亽.° 提交于 2021-01-28 19:37:32
问题 I am scrapping some listings of a website and managed to get most of the features to work except scrapping the description. here is the URL of one ad : https://eg.hatla2ee.com/en/car/honda/civic/3289785 Here is my code: for link in df['New Carlist Unit 1_link']: url = requests.get(link) soup = BeautifulSoup(url.text, 'html.parser') ### Get title title =[] try: title.append(soup.find('h1').text.strip()) except Exception as e: None ## Get price price = [] try: price.append(soup.find('span'

Need to create a dictionary of two span tags with in a wrapped up in a container. Using beautiful soups

大兔子大兔子 提交于 2021-01-28 19:29:08
问题 I am scrapping some listings of a website and managed to get most of the features to work except scrapping the description. here is the URL of one ad : https://eg.hatla2ee.com/en/car/honda/civic/3289785 Here is my code: for link in df['New Carlist Unit 1_link']: url = requests.get(link) soup = BeautifulSoup(url.text, 'html.parser') ### Get title title =[] try: title.append(soup.find('h1').text.strip()) except Exception as e: None ## Get price price = [] try: price.append(soup.find('span'

Beautiful Soup 4 .string() 'NoneType' object is not callable

丶灬走出姿态 提交于 2021-01-28 19:24:28
问题 from bs4 import BeautifulSoup import sys soup = BeautifulSoup(open(sys.argv[2]), 'html.parser') print(soup.prettify) if sys.argv[1] == "h": h2s = soup.find_all("h2") for h in h2s: print(h.string()) The first print statement (added as a test) works - so I know BS4 is working and everything. The second print statement throws: File "sp2gd.py", line 40, in <module> print(h.string()) TypeError: 'NoneType' object is not callable 回答1: BeautifulSoup's .string is a property, not a callable method, and

Issue with scraping Understat chart data using Selenium

懵懂的女人 提交于 2021-01-28 18:54:25
问题 I'm trying to scrape chart data under 'Timing Sheet' tab at https://understat.com/match/9457. My approach is to use BeautifulSoap and Selenium but I can't seem to get it to work. Here is my python script: from bs4 import BeautifulSoup import requests # Set the url we want xg_url = 'https://understat.com/match/9457' # Use requests to download the webpage xg_data = requests.get(xg_url) # Get the html code for the webpage xg_html = xg_data.content # Parse the html using bs4 soup = BeautifulSoup

Table Web Scraping Issues with Python

吃可爱长大的小学妹 提交于 2021-01-28 18:24:34
问题 I am having issues scraping data from this website: https://fantasy.premierleague.com/player-list I am interested in getting access to the player's names and points from the different tables. I'm relatively new to python and completely new to web scraping. Here is what I have so far: from urllib.request import urlopen from bs4 import BeautifulSoup url = 'https://fantasy.premierleague.com/player-list' html = urlopen(url) soup = BeautifulSoup(html, "lxml") rows = soup.find_all('tr') print(rows)