beautifulsoup

Beautifulsoup4 not installing for pipenv

这一生的挚爱 提交于 2020-12-26 01:57:31
问题 I wanted to install beautifulsoup4 with pipenv, i tried with cmd as well as pycharm, both gives this error ERROR MESSAGE: Installing dependencies from Pipfile.lock (0d3df0)… Installing initially failed dependencies… An error occurred while installing beautifulsoup==3.2.2 --hash=sha256:a04169602bff6e3138b1259dbbf491f5a27f9499dea9a8fbafd48843f9d89970 --hash=sha256:d31413d71f6ca027ff6b06c891b62ee8ff48267ccd969f881d810e5d1fe49565! Will try again. [pipenv.exceptions.InstallError]: File "c:\users

基于Dijkstra算法的武汉地铁路径规划!(附下载)

馋奶兔 提交于 2020-12-22 10:25:23
来源:Datawhale 本文 约3300字 ,建议阅读 10 分钟 本文为你详解路径规划项目,附源码链接。 前言 最近爬取了武汉地铁线路的信息,通过调用高德地图的api 获得各个站点的进度和纬度信息,使用Dijkstra算法对路径进行规划。 公众号(DatapiTHU)后台回复 “20201218” 获取项目源码下载 一、数据爬取 首先是需要获得武汉各个地铁的地铁站信息,通过爬虫爬取武汉各个地铁站点的信息,并存储到xlsx文件中。 武汉地铁线路图,2021最新武汉地铁线路图,武汉地铁地图-武汉本地宝wh.bendibao.com 方法:requests、BeautifulSoup、pandas import requests from bs4 import BeautifulSoup import pandas as pd def spyder(): #获得武汉的地铁信息 url='http://wh.bendibao.com/ditie/linemap.shtml' user_agent='Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50' headers = {'User-Agent'

爬虫大作业(虎扑足球新闻)

倖福魔咒の 提交于 2020-12-21 20:25:31
import requests from bs4 import BeautifulSoup import jieba from PIL import Image,ImageSequence import numpy as np import matplotlib.pyplot as plt from wordcloud import WordCloud,ImageColorGenerator def changeTitleToDict(): f = open('yingchao.txt', 'r',encoding='utf-8') str = f.read() stringList = list(jieba.cut(str)) symbol = {"/", "(", ")" , " ", ";", "!", "、" , ":"} stringSet = set(stringList) - symbol title_dict = {} for i in stringSet: title_dict[i] = stringList.count(i) print(title_dict) return title_dict for i in range(1,10): page = i; hupu = 'https://voice.hupu.com/soccer/tag/496-%s

爬虫爬取抖音热门音乐

老子叫甜甜 提交于 2020-12-18 07:42:25
爬取抖音的热门音乐 这个就相对来说简单一点,这是代码运行的结果 获取音乐的网址https://kuaiyinshi.com/hot/music/?source=dou-yin&page=1 打开该网页F12,F5刷新 做义工只需要以上的数据 根据beautifulsoup去获取,直接上代码 headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' } # 保存路径 save_path = "G: \\ Music \\ douyin \\ " url = "https://kuaiyinshi.com/hot/music/?source=dou-yin&page=1" # 获取响应 res = requests.get(url , headers =headers) # 使用 beautifulsoup 解析 soup = BeautifulSoup(res.text , 'lxml' ) # 选择标签获取最大页数 max_page = soup.select( 'li.page-item > a' )[- 2 ].text # 循环请求 for page

Scraping Wikipedia information (table)

五迷三道 提交于 2020-12-16 03:58:27
问题 I would need to scrape information regarding Elenco dei comuni per regione on Wikipedia. I would like to create an array that can allow me to associate each comune to the corresponding region, i.e. something like this: 'Abbateggio': 'Pescara' -> Abruzzo I tried to get information using BeautifulSoup and requests as follows: from bs4 import BeautifulSoup as bs import requests with requests.Session() as s: # use session object for efficiency of tcp re-use s.headers = {'User-Agent': 'Mozilla/5.0

how to fill excel file from selenium scraping in loop with python

浪尽此生 提交于 2020-12-15 07:02:31
问题 i am trying to scrap a website than contain many pages, with selenium i open each time a page in second 'TAB' and launch my function to get the data. after that i close the tab and open the next tab and continue extraction until the last page. my problem is when i save my data in the excel file, i found that it save just the last information extract from the last page(tab). can you help me to find my error ? def scrap_client_infos(linksss): tds=[] # tds is the list that contain the data

Python - Scrape movies titles with Splash & BS4

别等时光非礼了梦想. 提交于 2020-12-15 06:14:51
问题 I try to create my first script with Python. I'm using Splash and BS4. I followed this tutorial from John Watson Rooney (but with my own target) : How I Scrape JAVASCRIPT websites with Python My goal is to scrape this website survey : Best movies of 2020 Here's my problem : It renders multiple times the same titles but with up to 6 duplicates in the list without any logical order. Sometimes it renders less than 100 lines, sometimes more? What I want : Get the 100 titles, by order Export them

All elements from html not being extracted by Requests and BeautifulSoup in Python

本小妞迷上赌 提交于 2020-12-15 06:12:18
问题 I am trying to scrape odds from a site that displays current odds from different agencies for an assignment on the effects of market competition. I am using Requests and BeautifulSoup to extract the relevant data. However after using: import requests from bs4 import BeautifulSoup url = "https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/" r=requests.get(url) Print(r.text) It does not print any odds, yet if I inspect the element on the page I can see them

All elements from html not being extracted by Requests and BeautifulSoup in Python

故事扮演 提交于 2020-12-15 06:11:25
问题 I am trying to scrape odds from a site that displays current odds from different agencies for an assignment on the effects of market competition. I am using Requests and BeautifulSoup to extract the relevant data. However after using: import requests from bs4 import BeautifulSoup url = "https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/" r=requests.get(url) Print(r.text) It does not print any odds, yet if I inspect the element on the page I can see them

How to extract var (values) from <script> of html using beautifulsoup

社会主义新天地 提交于 2020-12-15 05:52:31
问题 i am currently using import requests from bs4 import BeautifulSoup source = requests.get('www.randomwebsite.com').text soup = BeautifulSoup(source,'lxml') details= soup.find('script') this is returning me the following script. <script> var Url = "https://www.example.com"; if(Url != ''){code} else {code } </script> i want to have the output as following. https://www.example.com 回答1: import re text = """ <script> var Url = "https://www.example.com"; if(Url != ''){code} else {code } </script> ""