爬虫的数据存储(TXT、JSON、CSV)

匿名 (未验证) 提交于 2019-12-02 23:26:52

TXT文本存储

将知乎的发现板块的内容存入txt文本

import requests from pyquery import PyQuery as pq url="https://www.zhihu.com/explore" myheader={   "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit 537.36 (KHTML, like Gecko) Chrome" } html=requests.get(url,headers=myheader).text doc=pq(html) items=doc('.explore-tab .feed-item').items() for item in items:     question=item.find('h2').text()     author=item.find(".author-link-line").text()     answer=pq(item.find(".content").html()).text()     file=open("explore.txt","a",encoding="utf-8")     file.write("\n".join([author,answer]))     file.write("\n"+"="*50+"\n") file.close() 

打开方式:


JSON文件存储

读取JSON
可以调用JSON库的load()方法将JSON文本字符串转换为JSON对象,可以调用dumps()方法将JSON对象转换为文本字符串。

import json str=''' [{    "name":"Bob",    "gender":"male",    "birthday":"1992-10-18"    },{    "name":"Selina",    "gender":"female",    "birthday":"1995-10-18" }] ''' print(type(str)) word=json.loads(str); print(word) print(type(word)) 

输出:

<class 'str'> [{'name': 'Bob', 'gender': 'male', 'birthday': '1992-10-18'}, {'name': 'Selina', 'gender': 'female', 'birthday': '1995-10-18'}] <class 'list'> 

获取键值对的两种方式:一种中括号加键名,另一种通过get()方法传入键名(get方法还可以传入第二个参数默认值)

word=json.loads(str) print(word[0]["name"]) print(word[0].get("name")) 

输出JSON
dumps()方法将JSON对象转化为字符串

import json str=[{    "name":"Bob",    "gender":"male",    "birthday":"1992-10-18"    },{    "name":"Selina",    "gender":"female",    "birthday":"1995-10-18" }] with open("datas.txt","w",encoding="utf-8") as file:     file.write(json.dumps(str)) 


dumps()方法还可以添加一个参数indent,代表缩进字符个数
为了输出中文,还需要指定参数ensure_ascii为False,另外还要规定文件输出的编码:
with open("datas.txt","w",encoding="utf-8") as file:     file.write(json.dumps(str,ensure_ascii=False)) 

CSV文件存储

CSV文件的写入

import csv with open("datas.csv","w") as csvfile:     writer=csv.writer(csvfile)     writer.writerow(["id","name","age"])     writer.writerow(["001","wuyou","21"])     writer.writerow(["002","chenwei","20"]) 

如果要修改列与列之间的分隔符,可以传入delimiter参数
也可以调用writerows()方法同时写入多行,此时参数就需要为二维列表。
读取CSV文件
调用csv库

import csv with open("datas.csv","r",encoding="utf-8") as csvfile:     reader=csv.reader(csvfile)     for row in reader:         print(row) 

调用pandas库的read_csv方法

文章来源: https://blog.csdn.net/qq_39905917/article/details/88847647
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!