A python socket client that outputs the source code of a website, why isn't this working?

陌路散爱 提交于 2020-01-13 06:47:28

问题


The following code doesn't output anything(why?).

#!/usr/bin/python           
import socket             

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 

s.connect(("www.python.org" , 80))
print s.recv(4096)
s.close    

What do I have to change in order to output the source code of the python website as you would see when you go to'view source' in a browser?


回答1:


HTTP is request/response protocol. You're not sending any request, thus you're not getting any response.

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 

s.connect(("www.python.org" , 80))
s.sendall("GET /\r\n") # you're missing this line
print s.recv(4096)
s.close    

Of course that will do the most raw HTTP/1.0 request, without handling HTTP errors, HTTP redirects, etc. I would not recommend it for actual usage beyond doing it as an exercise to familiarize yourself with socket programming and HTTP.

For HTTP Python provides few built in modules: httplib (bit lower level), urllib and urllib2 (high level ones).




回答2:


You'll get a redirect (302) unless you use the full URL in your request.

Try this instead:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 
s.connect(("www.python.org" , 80))
s.sendall("GET http://www.python.org HTTP/1.0\n\n")
print s.recv(4096)
s.close()

Of course if you just want the content of a URL this is far simpler. :)

print urllib2.urlopen('http://www.python.org').read()


来源:https://stackoverflow.com/questions/10600235/a-python-socket-client-that-outputs-the-source-code-of-a-website-why-isnt-this

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!