beautifulsoup

BeautifulSoup can't find required div

烈酒焚心 提交于 2021-02-10 07:01:22
问题 I have been trying to get at a nested div and its contents but am not able to. I want to access the div with class:'box coursebox'. response = res.read() soup = BeautifulSoup(response, "html.parser") div = soup.find_all('div', attrs={'class':'box coursebox'}) The above code gives a div with 0 elements, when there should be 8. find_all calls before this line work perfectly. Thanks for helping! 回答1: In the case of attributes having more than one value, Beautiful Soup puts all the values into a

BeautifulSoup can't find required div

不想你离开。 提交于 2021-02-10 07:01:21
问题 I have been trying to get at a nested div and its contents but am not able to. I want to access the div with class:'box coursebox'. response = res.read() soup = BeautifulSoup(response, "html.parser") div = soup.find_all('div', attrs={'class':'box coursebox'}) The above code gives a div with 0 elements, when there should be 8. find_all calls before this line work perfectly. Thanks for helping! 回答1: In the case of attributes having more than one value, Beautiful Soup puts all the values into a

Iterating html through tag classes with BeautifulSoup

≯℡__Kan透↙ 提交于 2021-02-10 06:58:46
问题 I'm saving some specific tags from webpage to an Excel file so I have this code: `import requests from bs4 import BeautifulSoup import openpyxl url = "http://www.euro.com.pl/telewizory-led-lcd-plazmowe,strona-1.bhtml" source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "html.parser") wb = openpyxl.Workbook() ws = wb.active tagiterator = soup.h2 row, col = 1, 1 ws.cell(row=row, column=col, value=tagiterator.getText()) tagiterator = tagiterator.find

Iterating html through tag classes with BeautifulSoup

半城伤御伤魂 提交于 2021-02-10 06:58:44
问题 I'm saving some specific tags from webpage to an Excel file so I have this code: `import requests from bs4 import BeautifulSoup import openpyxl url = "http://www.euro.com.pl/telewizory-led-lcd-plazmowe,strona-1.bhtml" source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "html.parser") wb = openpyxl.Workbook() ws = wb.active tagiterator = soup.h2 row, col = 1, 1 ws.cell(row=row, column=col, value=tagiterator.getText()) tagiterator = tagiterator.find

How to use beautifulsoup to check if a string exists

南笙酒味 提交于 2021-02-10 06:33:55
问题 Hi I am trying to write a program that scraps a URL and if the scrap data contains a particular string do something how can i use beautiful soup to achieve this import requests from bs4 import BeautifulSoup data = requests.get('https://www.google.com',verify=False) soup= BeautifulSoup(data.string,'html.parser') for inp in soup.find_all('input'): if inp == "Google Search": print ("found") else: print ("nothing") 回答1: Your inp is a html object. You must use get_text() function import requests

Narrow in a bit more on a particular bit of text using beautifulsoup

情到浓时终转凉″ 提交于 2021-02-10 06:31:51
问题 I'm trying to get the river level from here https://flood-warning-information.service.gov.uk/station/8108 I'm using this script import requests from bs4 import BeautifulSoup url = "https://flood-warning-information.service.gov.uk/station/8108" r = requests.get(url) soup = BeautifulSoup(r.content, "lxml") g_data = soup.find_all("header", {"intro"}) print g_data[0].text Which gives me River Skerne at John St Darlington Latest recorded level 0.72m at 10:30am Thursday 8 October 2020. which is

Beautiful soup returns None

我的梦境 提交于 2021-02-10 05:48:28
问题 I have the following html code and i use beautiful soup to extract information. I want to get for example Relationship status: Relationship <table class="box-content-list" cellspacing="0"> <tbody> <tr class="first"> <td> <strong> Relationship status: </strong> Relationship </td> </tr> <tr class="alt"> <td> <strong> Living: </strong> With partner </td> </tr> I have created the following code: xs = [x for x in soup.findAll('table', attrs = {'class':'box-content-list'})] for x in xs: #print x sx

你用 Python 写过最牛逼的程序是什么?

那年仲夏 提交于 2021-02-09 11:31:32
编译:Python开发者 - Jake_on 英文:Quora http://python.jobbole.com/85986/ 有网友在 Quora 上提问,「你用 Python 写过最牛逼的程序/脚本是什么?」。本文摘编了 3 个国外程序员的多个小项目,含代码。 Manoj Memana Jayakumar, 3000+ 顶 更新:凭借这些脚本,我找到了工作!可看我在这个帖子中的回复,《Has anyone got a job through Quora? Or somehow made lots of money through Quora?》 1. 电影/电视剧 字幕一键下载器 我们经常会遇到这样的情景,就是打开字幕网站subscene 或者opensubtitles, 搜索电影或电视剧的名字,然后选择正确的抓取器,下载字幕文件,解压,剪切并粘贴到电影所在的文件夹,并且需把字幕文件重命名以匹配电影文件的名字。是不是觉得太无趣呢?对了,我之前写了一个脚本,用来下载正确的电影或电视剧字幕文件,并且存储到与电影文件所在位置。所有的操作步骤仅需一键就可以完成。懵逼了吗? 请看这个 Youtube 视频:https://youtu.be/Q5YWEqgw9X8 源代码 存放在GitHub: subtitle-downloader 更新: 目前,该脚本支持多个字幕文件同时下载。步骤

python

浪尽此生 提交于 2021-02-09 11:04:28
import requests from bs4 import BeautifulSoup import sqlite3 conn = sqlite3.connect( " test.db " ) c = conn.cursor() for num in range(1,101 ): url = " https://cs.lianjia.com/ershoufang/pg%s/ " % num headers = { ' User-Agent ' : ' Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/64.0.3282.140 Safari/537.36 ' , } req = requests.session() response = req.get(url, headers=headers, verify= False) info = response.text f1 = BeautifulSoup(info, ' lxml ' ) f2 = f1.find(class_= ' sellListContent ' ) f3 = f2.find_all(class_= ' clear LOGCLICKDATA ' ) for i in f3: data_id =

python爬虫爬取链家二手房信息

…衆ロ難τιáo~ 提交于 2021-02-09 10:00:58
#coding=utf-8 import requests from fake_useragent import UserAgent from bs4 import BeautifulSoup import json import csv import time # 构建请求头 userAgent = UserAgent() headers = { 'user-agent': userAgent .Chrome } # 声明一个列表存储字典 data_list = [] def start_spider(page): #设置重连次数 requests.adapters.DEFAULT_RETRIES = 15 s = requests.session() #设置连接活跃状态为False s.keep_alive = False #爬取的url,默认爬取的南京的链家房产信息 url = 'https://nj.lianjia.com/ershoufang/pg{}/'.format(page) # 请求url resp = requests.get(url, headers=headers,timeout=10) # 讲返回体转换成Beautiful soup = BeautifulSoup(resp.content, 'lxml') # 筛选全部的li标签