How to save web page as text file [Python]

南笙酒味 提交于 2019-12-22 06:46:46

问题


I would like to save a web page (all content) as a text file. (As if you did right click on webpage -> "Save Page As" -> "Save as text file" and not as html file)

I have tried using the following code:

import urllib2
url=''
page = urllib2.urlopen(url)
page_content = page.read()
file = open('file_text.txt', 'w')
f.write(page_content)
f.close()

My goal is to be able to save a whole text without html code. (for example i would like read "è" instead "&eacute")


回答1:


Have a look at html2text as mentioned elsewhere

import urllib2
import html2text
url=''
page = urllib2.urlopen(url)
html_content = page.read()
rendered_content = html2text.html2text(html_content)
file = open('file_text.txt', 'w')
file.write(rendered_content)
file.close()


来源:https://stackoverflow.com/questions/35166169/how-to-save-web-page-as-text-file-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!