Python get all the contents from a website to html file

感情迁移 提交于 2020-06-17 15:58:06

问题


someone please help, i want to transfer all to contents from url to a html file can someone help me please? I have to use user-agent too!


回答1:


because I don't know what site you need scrape so I say a few wasy

if site contains JS frontend and for laoding needed waiting then I recommend you use requests_html module which has method for rendering content

from requests_html import HTMLSession

url = "https://some-url.org"

with HTMLSession() as session:
    response = session.get(url)
    response.html.render() #  rendering JS code
    content = response.html.html #  full content

if site doesn't use JS for frontent then requests module is really good choice for you

import requests

url = "https://some-url.org"

response = requests.get(url)
content = response.content #  html content in bytes()

else you can use selenium webdriver but it works few slowly for python




回答2:


Welcome to SO, when you ask a question you need to submit the code that you have tried, here's where you can learn to ask a question properly. Regarding your question, when you say "I want to transfer all to contents from url to a html file" I am assuming you just want to read the page source and save it in a file.

import requests as r
from bs4 import BeautifulSoup

data = r.get("http://example.com", headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0')
soup = BeautifulSoup(data.text)

file = open('myfile.html', 'w')
file.writelines(soup)
file.close()

if you get an error called TypeError: write() argument must be str, not Tag, just typecast soup to string.

file.writelines(str(soup))


来源:https://stackoverflow.com/questions/62394852/python-get-all-the-contents-from-a-website-to-html-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!