BeautifulSoup Error in file saving .txt

不问归期 提交于 2019-12-24 16:42:59

问题


from bs4 import BeautifulSoup
import requests
import os


url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r  = requests.get(url)
soup = BeautifulSoup(r.content.decode('utf-8', 'ignore'))
data = soup.find_all("article", {"class": "article"})

with open("data1.txt", "wb") as file:
   content=‘utf-8’
for item in data:
    content+='''{}\n{}\n\n{}\n{}'''.format( item.contents[0].find_all("time", {"datetime": "2016-03-16T09:50:30+0100"})[0].text,
                                            item.contents[0].find_all("a", {"class": "link-grey"})[0].text,
                                            item.contents[0].find_all("img", {"class": "media-full"})[0],
                                            item.contents[1].find_all("div", {"class": "article_textwrap"})[0].text,
                                            )
with open("data1.txt".format(file_name), "wb") as file:
    file.write(content)

Recently solved a utf/Unicode problem but now it isn't saving it as a .txt file nor saving it at all. What do I need to do?


回答1:


If you want to write the data as UTF-8 to the file try codecs.open like:

from bs4 import BeautifulSoup
import requests
import os
import codecs


url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r  = requests.get(url)
soup = BeautifulSoup(r.content)
data = soup.find_all("article", {"class": "article"})

with codecs.open("data1.txt", "wb", "utf-8") as filen:
    for item in data:
        filen.write(item.contents[0].find_all("time", {"datetime": "2016-03-16T09:50:30+0100"})[0].get_text())
        filen.write('\n')
        filen.write(item.contents[0].find_all("a", {"class": "link-grey"})[0].get_text())
        filen.write('\n\n')
        filen.write(item.contents[0].find_all("img", {"class": "media-full"})[0].get_text())
        filen.write('\n')
        filen.write(item.contents[1].find_all("div", {"class": "article_textwrap"})[0].get_text())

I'm unsure about filen.write(item.contents[0].find_all("img", {"class": "media-full"})[0]) because that returned a Tag instance for me.



来源:https://stackoverflow.com/questions/36044653/beautifulsoup-error-in-file-saving-txt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!