实例1:京东商品页面的爬取
1.锁定网址
在京东页面找到一款手机复制网址:https://item.jd.com/100003534811.html
2. 进行爬取
2.1) 爬取代码
import requests url = "https://item.jd.com/100003534811.html" r = requests.get(url) print(r.status_code) print(r.text[:1000])#仅打印需要内容
2.2) 返回信息
<!DOCTYPE HTML> <html lang="zh-CN"> <head> <!-- shouji --> <meta http-equiv="Content-Type" content="text/html; charset=gbk" /> <title>【小米Redmi K20 Pro】小米 Redmi K20Pro 4800万超广角三摄 8GB+128GB 冰川蓝 骁龙855 全网通4G 双卡双待 全面屏拍照智能游戏手机【行情 报价 价格 评测】-京东</title> <meta name="keywords" content="MIRedmi K20 Pro,小米Redmi K20 Pro,小米Redmi K20 Pro报价,MIRedmi K20 Pro报价"/> <meta name="description" content="【小米Redmi K20 Pro】京东JD.COM提供小米Redmi K20 Pro正品行货,并包括MIRedmi K20 Pro网购指南,以及小米Redmi K20 Pro图片、Redmi K20 Pro参数、Redmi K20 Pro评论、Redmi K20 Pro心得、Redmi K20 Pro技巧等信息,网购小米Redmi K20 Pro上京东,放心又轻松" /> <meta name="format-detection" content="telephone=no"> <meta http-equiv="mobile-agent" content="format=xhtml; url=//item.m.jd.com/product/100003534811.html"> <meta http-equiv="mobile-agent" content="format=html5; url=//item.m.jd.com/product/100003534811.html"> <meta http-equiv="X-UA-Compatible" content="IE=Edge"> <link rel="canonical" href="//item.jd.com/100003534811.html"/> <link
3. 全代码
import requests url = "https://item.jd.com/100003534811.html" try: r = requests.get(url) # 返回值为200则不会产生异常 r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[:1000]) except: print("爬取失败")
实例2:亚马逊商品页面的爬取
实例3:百度/360搜索关键字提交
实例4:网络图片的爬取和存储
实例5:IP地址归属地的自动查询