beautifulsoup

Using beautifulsoup to parse string efficiently

梦想与她 提交于 2021-01-04 07:27:08
问题 I am trying to parse this html to get the item title (e.g. Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW) <div style="" class=""> <h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about  </span>Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW</h1> <h2 id="subTitle" class="it-sttl"> Brand New + Free Shipping, Satisfaction Guaranteed! </h2> <!-- DO NOT change linkToTagId="rwid" as the catalog response

How do you scrape a table when the table is unable to return values? (BeautifulSoup)

流过昼夜 提交于 2021-01-02 08:26:12
问题 The following is my code: import numpy as np import pandas as pd import requests from bs4 import BeautifulSoup stats_page = requests.get('https://www.sports-reference.com/cbb/schools/loyola-il/2020.html') content = stats_page.content soup = BeautifulSoup(content, 'html.parser') table = soup.find(name='table', attrs={'id':'per_poss'}) html_str = str(table) df = pd.read_html(html_str)[0] df.head() And I get the error: ValueError: No tables found. However, when I swap attrs={'id':'per_poss'}

用PYTHON爬虫简单爬取网络小说

余生长醉 提交于 2020-12-31 10:11:52
用PYTHON爬虫简单爬取网络小说。 这里是17K小说网上,随便找了一本小说,名字是《千万大奖》。 里面主要是三个函数: 1、get_download_url() 用于获取该小说的所有章节的URL。 分析了该小说的目录页http://www.17k.com/list/2819620.html的HTML源码,发现其目录是包含在Volume里的A标签合集。所以就提取出了URLS列表。 2、get_contents(target) 用于获取小说指定章节的正文内容 分析了小说中第一章节的页面http://www.17k.com/chapter/2819620/34988369.html,发现其正文内容包含在P标签中,正文标题包含在H1标签中,经过对换行等处理,得到正文内容。传入参数是上一函数得到的URL。 3、writer(name, path, text) 用于将得到的正文内容和章节标题写入到千万大奖.txt 理论上,该简单爬虫可以爬取该网站的任意小说。 from bs4 import BeautifulSoup import requests, sys target='http://www.17k.com/list/2819620.html' server='http://www.17k.com' urls=[] def get_download_url(): req =

Unable to import beautifulsoup in python

我怕爱的太早我们不能终老 提交于 2020-12-30 02:32:47
问题 I'm using Python.7.10 and have installed beautifulsoup using pip. The package was installed successfully. But when I'm trying to import beautifulsoup, I'm getting this error: ImportError: No module named beautifulsoup I checked the list of my installed modules and I found the beautifulsoup module in the installed modules list: 回答1: You installed BeautifulSoup version 3; the module is called BeautifulSoup with capital B and S : from BeautifulSoup import BeautifulSoup See the Quickstart

Unable to import beautifulsoup in python

这一生的挚爱 提交于 2020-12-30 02:29:53
问题 I'm using Python.7.10 and have installed beautifulsoup using pip. The package was installed successfully. But when I'm trying to import beautifulsoup, I'm getting this error: ImportError: No module named beautifulsoup I checked the list of my installed modules and I found the beautifulsoup module in the installed modules list: 回答1: You installed BeautifulSoup version 3; the module is called BeautifulSoup with capital B and S : from BeautifulSoup import BeautifulSoup See the Quickstart

Unable to import beautifulsoup in python

六眼飞鱼酱① 提交于 2020-12-30 02:27:46
问题 I'm using Python.7.10 and have installed beautifulsoup using pip. The package was installed successfully. But when I'm trying to import beautifulsoup, I'm getting this error: ImportError: No module named beautifulsoup I checked the list of my installed modules and I found the beautifulsoup module in the installed modules list: 回答1: You installed BeautifulSoup version 3; the module is called BeautifulSoup with capital B and S : from BeautifulSoup import BeautifulSoup See the Quickstart

Python requests.get(url) returning javascript code instead of the page html

陌路散爱 提交于 2020-12-27 06:09:53
问题 I have a very simple problem. I'm trying to get the job description from the html of a linkedIn page, but instead of getting the html of the page I'm getting few lines that look like a javascript code instead. I'm very new to this so any help will be greatly appreciated! Thanks Here's my code: import requests url = "https://www.linkedin.com/jobs/view/inside-sales-manager-at-stericycle-1089095836/" page_html = requests.get(url).text print(page_html) When I run this I don't get the html that I

Python requests.get(url) returning javascript code instead of the page html

断了今生、忘了曾经 提交于 2020-12-27 06:09:49
问题 I have a very simple problem. I'm trying to get the job description from the html of a linkedIn page, but instead of getting the html of the page I'm getting few lines that look like a javascript code instead. I'm very new to this so any help will be greatly appreciated! Thanks Here's my code: import requests url = "https://www.linkedin.com/jobs/view/inside-sales-manager-at-stericycle-1089095836/" page_html = requests.get(url).text print(page_html) When I run this I don't get the html that I

Scraping wsj.com

╄→尐↘猪︶ㄣ 提交于 2020-12-27 03:08:33
问题 I wanted to scrape some data from wsj.com and print it. The actual website is: https://www.wsj.com/market-data/stocks?mod=md_home_overview_stk_main and the data is NYSE Issues Advancing, Declining and NYSE Share Volume Advancing, Declining. I tried using beautifulsoup after watching a youtube video but I can't get any of the classes to return a value inside body. Here is my code: from bs4 import BeautifulSoup import requests source = requests.get('https://www.wsj.com/market-data/stocks?mod=md

Beautifulsoup4 not installing for pipenv

人盡茶涼 提交于 2020-12-26 01:58:19
问题 I wanted to install beautifulsoup4 with pipenv, i tried with cmd as well as pycharm, both gives this error ERROR MESSAGE: Installing dependencies from Pipfile.lock (0d3df0)… Installing initially failed dependencies… An error occurred while installing beautifulsoup==3.2.2 --hash=sha256:a04169602bff6e3138b1259dbbf491f5a27f9499dea9a8fbafd48843f9d89970 --hash=sha256:d31413d71f6ca027ff6b06c891b62ee8ff48267ccd969f881d810e5d1fe49565! Will try again. [pipenv.exceptions.InstallError]: File "c:\users