Python Web Crawlers and “getting” html source code

前端 未结 4 1106
不知归路
不知归路 2020-12-24 13:53

So my brother wanted me to write a web crawler in Python (self-taught) and I know C++, Java, and a bit of html. I\'m using version 2.7 and reading the python library, but I

4条回答
  •  悲&欢浪女
    2020-12-24 14:30

    The first thing you need to do is read the HTTP spec which will explain what you can expect to receive over the wire. The data returned inside the content will be the "rendered" web page, not the source. The source could be a JSP, a servlet, a CGI script, in short, just about anything, and you have no access to that. You only get the HTML that the server sent you. In the case of a static HTML page, then yes, you will be seeing the "source". But for anything else you see the generated HTML, not the source.

    When you say modify the page and return the modified page what do you mean?

提交回复
热议问题