实例一--爬取页面
1 import requests
2 url="https//itemjd.com/2646846.html"
3 try:
4 r=requests.get(url)
5 r.raise_for_status()
6 r.encoding=r.apparent_encoding
7 print(r.text[:1000])
8 except:
9 print("爬取失败")
正常页面爬取
实例二--爬取页面
1 import requests
2 url="https://www.amazon.cn/gp/product/B01M8L5Z3Y"
3 try:
4 kv={'user-agent':'Mozilla/5.0'}
5 r=requests.get(url,headers=kv)
6 r.raise_for_status()
7 r.encoding=r.apparent_encoding
8 print(r.text[1000:2000])
9 except:
10 print("爬取失败")
对访问用户名有限制,模拟浏览器对网站请求
实例三--爬取搜索引擎
1 #百度的关键词接口:http://www.baidu.com/s?wd=keyword
2 #360的关键词接口:http://www.so.com/s?q=keyword
3 import requests
4 keyword="python"
5 try:
6 kv={'wd':keyword}
7 r=requests.get("http://www.baidu.com/s",params=kv)
8 print(r.request.url)
9 r.raise_for_status()
10 print(len(r.text))
11 except:
12 print("爬取失败")--------------------------------------------------
import requestskeyword="python"try: kv={'q':keyword} r=requests.get("http://www.so.com/s",params=kv) print(r.request.url) r.raise_for_status() print(len(r.text))except: print("爬取失败")
实例四--:爬取图片
1 import requests
2 import os
3 url="http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg"
4 root="F://pics//"
5 path=root+url.split('/')[-1]
6 try:
7 if not os.path.exists(root):
8 os.mkdir(root)
9 if not os.path.exists(path):
10 r=requests.get(url)
11 with open(path,'wb') as f:
12 f.write(r.content)
13 f.close()
14 print("文件保存成功")
15 else:
16 print("文件已经存在")
17 except:
18 print("爬取失败")
爬取并保存图片
实例五--IP地址归属地查询:
http://m.ip138.com/ip.asp?ip=ipaddress
url="http://www.ip138.com/iplookup.asp?ip="
try:
r=requests.get(url+'202.204.80.112'+'&action=2')
r.raise_for_status()
r.encoding=r.apparent_encoding
print(r.text[-500:])
except:
print("爬取失败")
有反爬了
来源:https://www.cnblogs.com/cy2268540857/p/12424091.html