beautifulsoup | 易学教程

Using beautifulsoup to parse string efficiently

阅读更多关于 Using beautifulsoup to parse string efficiently

问题 I am trying to parse this html to get the item title (e.g. Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW) <div style="" class=""> <h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about </span>Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW</h1> <h2 id="subTitle" class="it-sttl"> Brand New + Free Shipping, Satisfaction Guaranteed! </h2> <!-- DO NOT change linkToTagId="rwid" as the catalog response

How do you scrape a table when the table is unable to return values? (BeautifulSoup)

阅读更多关于 How do you scrape a table when the table is unable to return values? (BeautifulSoup)

问题 The following is my code: import numpy as np import pandas as pd import requests from bs4 import BeautifulSoup stats_page = requests.get('https://www.sports-reference.com/cbb/schools/loyola-il/2020.html') content = stats_page.content soup = BeautifulSoup(content, 'html.parser') table = soup.find(name='table', attrs={'id':'per_poss'}) html_str = str(table) df = pd.read_html(html_str)[0] df.head() And I get the error: ValueError: No tables found. However, when I swap attrs={'id':'per_poss'}

用PYTHON爬虫简单爬取网络小说

阅读更多关于用PYTHON爬虫简单爬取网络小说

用PYTHON爬虫简单爬取网络小说。这里是17K小说网上，随便找了一本小说，名字是《千万大奖》。里面主要是三个函数： 1、get_download_url() 用于获取该小说的所有章节的URL。分析了该小说的目录页http://www.17k.com/list/2819620.html的HTML源码，发现其目录是包含在Volume里的A标签合集。所以就提取出了URLS列表。 2、get_contents(target) 用于获取小说指定章节的正文内容分析了小说中第一章节的页面http://www.17k.com/chapter/2819620/34988369.html，发现其正文内容包含在P标签中，正文标题包含在H1标签中，经过对换行等处理，得到正文内容。传入参数是上一函数得到的URL。 3、writer(name, path, text) 用于将得到的正文内容和章节标题写入到千万大奖.txt 理论上，该简单爬虫可以爬取该网站的任意小说。 from bs4 import BeautifulSoup import requests, sys target='http://www.17k.com/list/2819620.html' server='http://www.17k.com' urls=[] def get_download_url(): req =

Unable to import beautifulsoup in python

阅读更多关于 Unable to import beautifulsoup in python

问题 I'm using Python.7.10 and have installed beautifulsoup using pip. The package was installed successfully. But when I'm trying to import beautifulsoup, I'm getting this error: ImportError: No module named beautifulsoup I checked the list of my installed modules and I found the beautifulsoup module in the installed modules list: 回答1: You installed BeautifulSoup version 3; the module is called BeautifulSoup with capital B and S : from BeautifulSoup import BeautifulSoup See the Quickstart

Unable to import beautifulsoup in python

阅读更多关于 Unable to import beautifulsoup in python

Unable to import beautifulsoup in python

阅读更多关于 Unable to import beautifulsoup in python

Python requests.get(url) returning javascript code instead of the page html

阅读更多关于 Python requests.get(url) returning javascript code instead of the page html

问题 I have a very simple problem. I'm trying to get the job description from the html of a linkedIn page, but instead of getting the html of the page I'm getting few lines that look like a javascript code instead. I'm very new to this so any help will be greatly appreciated! Thanks Here's my code: import requests url = "https://www.linkedin.com/jobs/view/inside-sales-manager-at-stericycle-1089095836/" page_html = requests.get(url).text print(page_html) When I run this I don't get the html that I

Python requests.get(url) returning javascript code instead of the page html

阅读更多关于 Python requests.get(url) returning javascript code instead of the page html

Scraping wsj.com

阅读更多关于 Scraping wsj.com

问题 I wanted to scrape some data from wsj.com and print it. The actual website is: https://www.wsj.com/market-data/stocks?mod=md_home_overview_stk_main and the data is NYSE Issues Advancing, Declining and NYSE Share Volume Advancing, Declining. I tried using beautifulsoup after watching a youtube video but I can't get any of the classes to return a value inside body. Here is my code: from bs4 import BeautifulSoup import requests source = requests.get('https://www.wsj.com/market-data/stocks?mod=md

Beautifulsoup4 not installing for pipenv

阅读更多关于 Beautifulsoup4 not installing for pipenv

问题 I wanted to install beautifulsoup4 with pipenv, i tried with cmd as well as pycharm, both gives this error ERROR MESSAGE: Installing dependencies from Pipfile.lock (0d3df0)… Installing initially failed dependencies… An error occurred while installing beautifulsoup==3.2.2 --hash=sha256:a04169602bff6e3138b1259dbbf491f5a27f9499dea9a8fbafd48843f9d89970 --hash=sha256:d31413d71f6ca027ff6b06c891b62ee8ff48267ccd969f881d810e5d1fe49565! Will try again. [pipenv.exceptions.InstallError]: File "c:\users