scrapy

在 Ubuntu 20.04 上安装 Python Pip

╄→尐↘猪︶ㄣ 提交于 2020-07-27 05:25:36
如何在 Ubuntu 20.04 上安装 Python Pip Pip 是一个用来安装 Python 软件包的工具。通过 pip,你可以从 Python 软件包索引(Pypi)和其他软件包索引中搜索,下载并且安装软件包。 这篇指南讲解如何在 Ubuntu 20.04 中安装 Python 3 和 Python 2. 我们将会讲解使用 pip 来安装和管理 Python 软件包的基础只是。 开始之前 Python 有两个分支,Python 2 和 Python 3。 从 Ubuntu 20.04 开始,Python 3 被包括在基础的系统安装之中。Python 2 在通用源仓库中可用。我们鼓励用户切换到 Python 3。 仅仅在没有模块对应的 deb 包的情况下,才使用 pip 来全局安装一个模块。 通过虚拟环境来使用 pip。 Python 虚拟环境允许你在每一个指定的项目下一个独立的位置安装 Python 模块,而不是全局安装。这种方式,你不用担心会影响其他的 Python 项目 为 Python 3 安装 pip 想要在 Ubuntu 20.04 上为 Python 3 安装 pip,以 root 或者其他 sudo 用户身份在终端运行下面的命令: sudo apt update sudo apt install python3-pip 上面的命令将会安装用来构建

Python scrapy爬取京东,百度百科出现乱码,解决方案

房东的猫 提交于 2020-07-25 01:14:45
Python scrapy爬取京东 百度百科出现乱码 解决方案 十分想念顺店杂可。。。 抓取百度百科,出现乱码 把页面源码下载下来之后,发现全是乱码,浏览器打开 但是浏览器链接打开就没有乱码 以下是浏览器里面的源码 到这一步说明我们下载网页源码,保存的时候出了问题找了好久,才知道是编码问题,以下为解决方案 # -*- coding: utf-8 -*- # @Time : 2019/5/13 15:49 # @Author : 甄超锋 # @Email : 4535@sohu.com # @File : asd.py # @Software: PyCharm import requests url = "https://baike.baidu.com/item/%E6%9D%8E%E5%B9%BC%E6%96%8C/7850567#1" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Host": "baike.baidu.com", "Connection": "keep-alive", } response = requests.get(url=url, headers=headers) text_iso_by

Scrapy empty output

白昼怎懂夜的黑 提交于 2020-07-23 07:35:08
问题 I am trying to use Scrapy to extract data from page. But I get an empty output. What is the problem? spider: class Ratemds(scrapy.Spider): name = 'ratemds' allowed_domains = ['ratemds.com'] custom_settings = { 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747', } def start_requests(self): yield scrapy.Request('https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md

Scrapy empty output

天涯浪子 提交于 2020-07-23 07:34:34
问题 I am trying to use Scrapy to extract data from page. But I get an empty output. What is the problem? spider: class Ratemds(scrapy.Spider): name = 'ratemds' allowed_domains = ['ratemds.com'] custom_settings = { 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747', } def start_requests(self): yield scrapy.Request('https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md

Scrapy empty output

半腔热情 提交于 2020-07-23 07:33:13
问题 I am trying to use Scrapy to extract data from page. But I get an empty output. What is the problem? spider: class Ratemds(scrapy.Spider): name = 'ratemds' allowed_domains = ['ratemds.com'] custom_settings = { 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747', } def start_requests(self): yield scrapy.Request('https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md

How do I change the browser used by the scrapy view command?

吃可爱长大的小学妹 提交于 2020-07-22 05:49:06
问题 How do I change the browser used by the view(response) command in the scrapy shell? It defaults to safari on my machine but I'd like it to use chrome as the development tools in chrome are better. 回答1: As eLRuLL already mentioned, view(response) uses webbrowser to open the web page you downloaded. To change its behavior, you need to set a BROWSER environment variable. You could do this by adding the following line at the end of your ~/.bashrc file: export BROWSER=/usr/bin/firefox (if you

Getting AttributeError error 'str' object has no attribute 'get'

不羁的心 提交于 2020-07-19 17:48:38
问题 I am getting an error while working with JSON response: Error: AttributeError: 'str' object has no attribute 'get' What could be the issue? I am also getting the following errors for the rest of the values: *** TypeError: 'builtin_function_or_method' object is not subscriptable 'Phone': value['_source']['primaryPhone'], KeyError: 'primaryPhone' *** # -*- coding: utf-8 -*- import scrapy import json class MainSpider(scrapy.Spider): name = 'main' start_urls = ['https://experts.expcloud.com/api4

Getting AttributeError error 'str' object has no attribute 'get'

非 Y 不嫁゛ 提交于 2020-07-19 17:46:19
问题 I am getting an error while working with JSON response: Error: AttributeError: 'str' object has no attribute 'get' What could be the issue? I am also getting the following errors for the rest of the values: *** TypeError: 'builtin_function_or_method' object is not subscriptable 'Phone': value['_source']['primaryPhone'], KeyError: 'primaryPhone' *** # -*- coding: utf-8 -*- import scrapy import json class MainSpider(scrapy.Spider): name = 'main' start_urls = ['https://experts.expcloud.com/api4

scrapy: request url must be str or unicode, got Selector

老子叫甜甜 提交于 2020-07-19 08:54:16
问题 I am writing a spider using Scrapy, to scrape user details of Pinterest. I am trying to get the details of user and his followers ( and so on until the last node). Below is the spider code: from scrapy.spider import BaseSpider import scrapy from pinners.items import PinterestItem from scrapy.http import FormRequest from urlparse import urlparse class Sample(BaseSpider): name = 'sample' allowed_domains = ['pinterest.com'] start_urls = ['https://www.pinterest.com/banka/followers', ] def parse

Scrapy: output empty

笑着哭i 提交于 2020-07-10 10:28:16
问题 I am trying to use Scrapy to extract data from page. But I get an empty output. What is the problem? spider: class Ratemds(scrapy.Spider): name = 'ratemds' allowed_domains = ['ratemds.com'] custom_settings = { 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747', } def start_requests(self): yield scrapy.Request('https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md