python-爬虫-庆余年小说-词云胡乱分析
真的不想再看见有谁未经许可也不标明出处搬运我的文章了,所以我自己先在博客园同步发一个。 进入正题,首先要搞到资源,我先去了搜索了一番,找到个网站“落霞”。一言不合就按下了F12,翻了下网页源码,超级简单。 1 from bs4 import BeautifulSoup 2 from requests import Session 3 from re import sub,DOTALL 4 sess = Session() 5 txt=[] 6 url = 'https://www.luoxia.com/qing/48416.htm' 7 def find(url): 8 res = sess.get(url) 9 soup = BeautifulSoup(res.content,'html.parser') 10 title = soup.find('title') 11 div = soup.find('div',id='nr1') 12 ps = div.find_all('p') 13 page = title.text+'\n' 14 print(page) 15 for p in ps: 16 page += p.text+'\n' 17 txt.append(page) 18 try: 19 a = soup.find('a',rel='next') 20 href =