爬取bilibili的弹幕制作词云
爬取哔哩哔哩的弹幕,http://comment.bilibili.com/6315651.xml 需要知道cid,可以F12,F5刷新,找cid,找到之后拼接url 也可以写代码,解析response获取cid,然后再拼接 使用requests或者urllib都可以 我是用requests,请求该链接获取到xml文件 代码:获取xml def get_data (): res = requests.get( 'http://comment.bilibili.com/6315651.xml' ) res.encoding = 'utf8' with open ( 'gugongdanmu.xml' , 'a' , encoding = 'utf8' ) as f: f.writelines(res.text) 解析xml, def analyze_xml (): f1 = open ( "gugongdanmu.xml" , "r" , encoding = 'utf8' ) f2 = open ( "tanmu2.txt" , "w" , encoding = 'utf8' ) count = 0 # 正则匹配解决 xml 的多余的字符 dr = re.compile( r'<[^>]+>' , re.S) while 1 : line = f1.readline() if