How can I extract HTML code with Scapy?

三世轮回 提交于 2019-12-11 05:42:43

问题


I recently began to use the scapy library for Python 2.x I found there to be minimal documentation on the sniff() function. I began to play around with it and found that I can veiw TCP packets at a very low level. So far I have only found informational data. For example:

Here is what I put in the scapy terminal:

A = sniff(filter="tcp and host 216.58.193.78", count=2)

This is a request to google.com asking for the homepage:

<Ether  dst=e8:de:27:55:17:f3 src=00:24:1d:20:a6:1b type=0x800 |<IP  version=4L ihl=5L tos=0x0 len=60 id=46627 flags=DF frag=0L ttl=64 proto=tcp chksum=0x2a65 src=192.168.0.2 dst=216.58.193.78 options=[] |<TCP  sport=54036 dport=www seq=2948286264 ack=0 dataofs=10L reserved=0L flags=S window=29200 chksum=0x5a62 urgptr=0 options=[('MSS', 1460), ('SAckOK', ''), ('Timestamp', (389403, 0)), ('NOP', None), ('WScale', 7)] |>>>

Here is the response:

<Ether  dst=00:24:1d:20:a6:1b src=e8:de:27:55:17:f3 type=0x800 |<IP  version=4L ihl=5L tos=0x0 len=60 id=42380 flags= frag=0L ttl=55 proto=tcp chksum=0x83fc src=216.58.193.78 dst=192.168.0.2 options=[] |<TCP  sport=www dport=54036 seq=3087468609 ack=2948286265 dataofs=10L reserved=0L flags=SA window=42540 chksum=0xecaf urgptr=0 options=[('MSS', 1430), ('SAckOK', ''), ('Timestamp', (2823173876, 389403)), ('NOP', None), ('WScale', 7)] |>>>

Using this function, is there a way that I can extract HTML code from the response?

Also, what do those packets look like?

And finaly, Why are both of these packets nearly identical?


回答1:


The segments in your example are "nearly identical" because they are the TCP SYN and SYN-ACK segments which are part of the TCP handshake, HTTP request and response comes after that during the connection (usually when in ESTABLISHED state except when TCP Fast Open option is used) so you need to look at segments after the handshake to get the data you are interested in.

         SYN
C ---------------> S
       SYN-ACK
C <--------------- S
         ACK
C ---------------> S
    HTTP request
C ---------------> S
         ACK
C <--------------- S
    HTTP response
C <--------------- S  <= Here is the server's answer
         ACK
C ---------------> S
...

You can use Scapy's Raw layer to extract data above TCP in your case (e.g. pkt[Raw])




回答2:


Have you tried using scapy-http? It's a great scapy extension that helps with this exact issue



来源:https://stackoverflow.com/questions/38385646/how-can-i-extract-html-code-with-scapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!