beautifulsoup

web-scraping with python 3.6 and beautifulsoup - getting Invalid URL

会有一股神秘感。 提交于 2020-06-26 05:54:00
问题 I want to work with this page in Python: http://www.sothebys.com/en/search-results.html?keyword=degas%27 This is my code: from bs4 import BeautifulSoup import requests page = requests.get('http://www.sothebys.com/en/search-results.html?keyword=degas%27') soup = BeautifulSoup(page.content, "lxml") print(soup) I'm getting following output: <html><head> <title>Invalid URL</title> </head><body> <h1>Invalid URL</h1> The requested URL "[no URL]", is invalid.<p> Reference #9.8f4f1502.1494363829

How to extract h1 tag text with beautifulsoup

耗尽温柔 提交于 2020-06-25 05:00:11
问题 I'd like to understand how to extract a h1 tag text which contains many others tags in it using beautiful soup : <h1 class="listing-name"> Hôtel Vevey <span class="entry-feedbacks-summary-title-rating-stars-container bootstrap"> <span class="entry-feedbacks-summary-title-rating-stars entry-feedbacks-summary-title-rating-stars-empty" data-container=".entry-feedbacks-summary-title-rating-stars-container" data-content="Il n'y a pas encore d'avis de clients à propos de Astra Hôtel Vevey 4*sup.

Extract data using bs4 from a javascript text span

末鹿安然 提交于 2020-06-23 18:41:07
问题 im trying to extract some data from a span that is after a text/javascript script, i tried with regex both its to fragile: how can i get the span after text/javascript? html_content = urlopen('https://www.icewarehouse.com/Bauer_Vapor_1X/descpage-V1XS7.html') soup = BeautifulSoup(html_content, "lxml") price =soup.find(class_='crossout') span = price('span') print(span) output disired: 649.99 949.99 回答1: I think you are trying to get the minimum and maximum of the array msrp . In which case you

ModuleNotFoundError: No module named 'bs4

有些话、适合烂在心里 提交于 2020-06-23 16:29:25
问题 When I try to import BeautifulSoup like this from bs4 import BeautifulSoup And when I run my code, I've this error message. ModuleNotFoundError: No module named 'bs4 If someone know how to resolve this problem, it's will be great ! edit(My code) import os import csv import requests import bs4 requete = requests.get("https://url") page = requete.content soup = BeautifulSoup(page) h1 = soup.find("h1", {"class": "page_title"}) print(h1.string) 回答1: You either a) have not installed BeautifulSoup

Python get all the contents from a website to html file

主宰稳场 提交于 2020-06-17 15:58:53
问题 someone please help, i want to transfer all to contents from url to a html file can someone help me please? I have to use user-agent too! 回答1: because I don't know what site you need scrape so I say a few wasy if site contains JS frontend and for laoding needed waiting then I recommend you use requests_html module which has method for rendering content from requests_html import HTMLSession url = "https://some-url.org" with HTMLSession() as session: response = session.get(url) response.html

Python get all the contents from a website to html file

感情迁移 提交于 2020-06-17 15:58:06
问题 someone please help, i want to transfer all to contents from url to a html file can someone help me please? I have to use user-agent too! 回答1: because I don't know what site you need scrape so I say a few wasy if site contains JS frontend and for laoding needed waiting then I recommend you use requests_html module which has method for rendering content from requests_html import HTMLSession url = "https://some-url.org" with HTMLSession() as session: response = session.get(url) response.html

Request Returns Response 447

余生长醉 提交于 2020-06-17 13:10:50
问题 I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site? Code: r = requests.get('https://foobar) soup = BeautifulSoup(r.text, 'html.parser') print(soup.get

Using BeautifulSoup in order to find all “ul” and “li” elements

…衆ロ難τιáo~ 提交于 2020-06-17 09:46:27
问题 I'm currently working on a crawling-script in Python where I want to map the following HTML-response into a multilist or a dictionary (it does not matter). My current code is: from bs4 import BeautifulSoup from urllib.request import Request, urlopen req = Request("https://my.site.com/crawl", headers={'User-Agent': 'Mozilla/5.0'}) webpage = urlopen(req) soup = BeautifulSoup(webpage, 'html.parser') ul = soup.find('ul', {'class': ''}) After running this I get the following result stored in ul :

Grabbing data from subsequent pages of a website

醉酒当歌 提交于 2020-06-17 08:04:29
问题 I'm trying to grab data from every page of the returned results for this page. https://www.azjobconnection.gov/ada/mn_warn_dsp.cfm?def=false&securitysys=on It's hard to verify if I've grabbed everything since when you hit the next page button everything gets out of order. The only page that is sorted by year is the first page. Subsequent pages have the data outside the range originally selected. For instance, if you enter 01/01/2020 at the search page, the first page returned will have only

Grabbing data from subsequent pages of a website

牧云@^-^@ 提交于 2020-06-17 08:04:19
问题 I'm trying to grab data from every page of the returned results for this page. https://www.azjobconnection.gov/ada/mn_warn_dsp.cfm?def=false&securitysys=on It's hard to verify if I've grabbed everything since when you hit the next page button everything gets out of order. The only page that is sorted by year is the first page. Subsequent pages have the data outside the range originally selected. For instance, if you enter 01/01/2020 at the search page, the first page returned will have only