How can I capture all network requests and full response data when loading a page in Chrome?

后端 未结 5 1509
时光说笑
时光说笑 2020-12-05 04:50

Using Puppeteer, I\'d like to load a URL in Chrome and capture the following information:

  • request URL
  • request headers
  • request post data
5条回答
  •  天涯浪人
    2020-12-05 05:20

    Puppeteer-only solution

    This can be done with puppeteer alone. The problem you are describing that the response.buffer is cleared on navigation, can be circumvented by processing each request one after another.

    How it works

    The code below uses page.setRequestInterception to intercept all requests. If there is currently a request being processed/being waited for, new requests are put into a queue. Then, response.buffer() can be used without the problem that other requests might asynchronously wipe the buffer as there are no parallel requests. As soon as the currently processed request/response is handled, the next request will be processed.

    Code

    const puppeteer = require('puppeteer');
    
    (async () => {
        const browser = await puppeteer.launch();
        const [page] = await browser.pages();
    
        const results = []; // collects all results
    
        let paused = false;
        let pausedRequests = [];
    
        const nextRequest = () => { // continue the next request or "unpause"
            if (pausedRequests.length === 0) {
                paused = false;
            } else {
                // continue first request in "queue"
                (pausedRequests.shift())(); // calls the request.continue function
            }
        };
    
        await page.setRequestInterception(true);
        page.on('request', request => {
            if (paused) {
                pausedRequests.push(() => request.continue());
            } else {
                paused = true; // pause, as we are processing a request now
                request.continue();
            }
        });
    
        page.on('requestfinished', async (request) => {
            const response = await request.response();
    
            const responseHeaders = response.headers();
            let responseBody;
            if (request.redirectChain().length === 0) {
                // body can only be access for non-redirect responses
                responseBody = await response.buffer();
            }
    
            const information = {
                url: request.url(),
                requestHeaders: request.headers(),
                requestPostData: request.postData(),
                responseHeaders: responseHeaders,
                responseSize: responseHeaders['content-length'],
                responseBody,
            };
            results.push(information);
    
            nextRequest(); // continue with next request
        });
        page.on('requestfailed', (request) => {
            // handle failed request
            nextRequest();
        });
    
        await page.goto('...', { waitUntil: 'networkidle0' });
        console.log(results);
    
        await browser.close();
    })();
    

提交回复
热议问题