How to use puppeteer to dump WebSocket data

本小妞迷上赌 提交于 2019-11-27 13:07:59

问题


I want to get websocket data in this page https://upbit.com/exchange?code=CRIX.UPBIT.KRW-BTC, its websocket URL is dynamic and only valid during the first connection, the second time you connect to it it will not send data anymore.

So I wonder that maybe headless chrome can help me to monitor the websocket data.

Any ideas? Thanks!


回答1:


You actually don't need to do anything complex on this. The URL though seems dynamic, but works fine through code as well. The reason it doesn't work is that you need to understand what is happening in the background.

First let's look at the Network Tab.

The cookies and the Origin may be of importance to connecting. So we note these down.

Now let us look at the data exchanges on the socket

If you look at the frames the initial frame receives o as the data, which may indicate a opening connection. And then the website sends some data to the socket, which may be related to what we want to query. When the connection gets halted for some time, the socket receives h as the data. This may indicated a hold or something (as shown in second image)

To get the exact data we put a breakpoint in the code

And then print the value in the console

Now we have enough information to hit to the coding part. I found below to be a good websocket library for this

https://github.com/websockets/ws

So we do a

yarn add ws || npm install ws --save

Now we write our code

const WebSocket = require("ws")
const ws = new WebSocket("wss://example.com/sockjs/299/enavklnl/websocket",null,{
    headers: {
        "Cookie":"<cookie data noted earlier>",
        "User-Agent": "<Your browser agent>"
    },
    origin: "https://example.com",
})
const opening_message = '["[{\\"ticket\\":\\"ram macbook\\"},{\\"type\\":\\"recentCrix\\",\\"codes\\":[\\"CRIX.UPBIT.KRW-BTC\\",\\"CRIX.BITFINEX.USD-BTC\\",\\"CRIX.BITFLYER.JPY-BTC\\",\\"CRIX.OKCOIN.CNY-BTC\\",\\"CRIX.KRAKEN.EUR-BTC\\",\\"CRIX.UPBIT.KRW-DASH\\",\\"CRIX.UPBIT.KRW-ETH\\",\\"CRIX.UPBIT.KRW-NEO\\",\\"CRIX.UPBIT.KRW-BCC\\",\\"CRIX.UPBIT.KRW-MTL\\",\\"CRIX.UPBIT.KRW-LTC\\",\\"CRIX.UPBIT.KRW-STRAT\\",\\"CRIX.UPBIT.KRW-XRP\\",\\"CRIX.UPBIT.KRW-ETC\\",\\"CRIX.UPBIT.KRW-OMG\\",\\"CRIX.UPBIT.KRW-SNT\\",\\"CRIX.UPBIT.KRW-WAVES\\",\\"CRIX.UPBIT.KRW-PIVX\\",\\"CRIX.UPBIT.KRW-XEM\\",\\"CRIX.UPBIT.KRW-ZEC\\",\\"CRIX.UPBIT.KRW-XMR\\",\\"CRIX.UPBIT.KRW-QTUM\\",\\"CRIX.UPBIT.KRW-LSK\\",\\"CRIX.UPBIT.KRW-STEEM\\",\\"CRIX.UPBIT.KRW-XLM\\",\\"CRIX.UPBIT.KRW-ARDR\\",\\"CRIX.UPBIT.KRW-KMD\\",\\"CRIX.UPBIT.KRW-ARK\\",\\"CRIX.UPBIT.KRW-STORJ\\",\\"CRIX.UPBIT.KRW-GRS\\",\\"CRIX.UPBIT.KRW-VTC\\",\\"CRIX.UPBIT.KRW-REP\\",\\"CRIX.UPBIT.KRW-EMC2\\",\\"CRIX.UPBIT.KRW-ADA\\",\\"CRIX.UPBIT.KRW-SBD\\",\\"CRIX.UPBIT.KRW-TIX\\",\\"CRIX.UPBIT.KRW-POWR\\",\\"CRIX.UPBIT.KRW-MER\\",\\"CRIX.UPBIT.KRW-BTG\\",\\"CRIX.COINMARKETCAP.KRW-USDT\\"]},{\\"type\\":\\"crixTrade\\",\\"codes\\":[\\"CRIX.UPBIT.KRW-BTC\\"]},{\\"type\\":\\"crixOrderbook\\",\\"codes\\":[\\"CRIX.UPBIT.KRW-BTC\\"]}]"]'
ws.on('open', function open() {
    console.log("opened");
});

ws.on('message', function incoming(data) {
    if (data == "o" || data == "h") {
        console.log("sending opening message")
        ws.send(opening_message)
    }
    else {
        console.log("Received", data)

    }
});

And running the code we get

Now if I replace

const ws = new WebSocket("wss://example.com/sockjs/299/enavklnl/websocket",null,{
    headers: {
        "Cookie":"<cookie data noted earlier>",
        "User-Agent": "<Your browser agent>"
    },
    origin: "https://example.com",
})

to

const ws = new WebSocket("wss://example.com/sockjs/299/enavklnl/websocket")

Which means cookies and origin was never needed as such. But I would still recommend you to use them




回答2:


const client = page._client

client.on('Network.webSocketCreated', ({requestId, url}) => {
  console.log('Network.webSocketCreated', requestId, url)
})

client.on('Network.webSocketClosed', ({requestId, timestamp}) => {
  console.log('Network.webSocketClosed', requestId, timestamp)
})

client.on('Network.webSocketFrameSent', ({requestId, timestamp, response}) => {
  console.log('Network.webSocketFrameSent', requestId, timestamp, response.payloadData)
})

client.on('Network.webSocketFrameReceived', ({requestId, timestamp, response}) => {
  console.log('Network.webSocketFrameReceived', requestId, timestamp, response.payloadData)
})

It is by using DevTools protocol directly - https://chromedevtools.github.io/devtools-protocol/tot/Network#event-webSocketClosed




回答3:


I don't think puppeteer has support for this yet, but the lower-level protocol does here: https://chromedevtools.github.io/devtools-protocol/tot/Network/#event-webSocketFrameSent and https://chromedevtools.github.io/devtools-protocol/tot/Network#type-WebSocketResponse. This means that you could implement this yourself in a library if you wanted too.



来源:https://stackoverflow.com/questions/48375700/how-to-use-puppeteer-to-dump-websocket-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!