How to download file with puppeteer using headless: true?

我怕爱的太早我们不能终老 提交于 2019-11-30 05:06:46

This page downloads a csv by creating a comma delimited string and forcing the browser to download it by setting the data type like so

let uri = "data:text/csv;charset=utf-8," + encodeURIComponent(content);
window.open(uri, "Some CSV");

This on chrome opens a new tab.

You can tap into this event and physically download the contents into a file. Not sure if this is the best way but works well.

const browser = await puppeteer.launch({
  headless: true
});
browser.on('targetcreated', async (target) => {
    let s = target.url();
    //the test opens an about:blank to start - ignore this
    if (s == 'about:blank') {
        return;
    }
    //unencode the characters after removing the content type
    s = s.replace("data:text/csv;charset=utf-8,", "");
    //clean up string by unencoding the %xx
    ...
    fs.writeFile("/tmp/download.csv", s, function(err) {
        if(err) {
            console.log(err);
            return;
        }
        console.log("The file was saved!");
    }); 
});

const page = await browser.newPage();
.. open link ...
.. click on download link ..

The problem is that the browser closes before download finished.

You can get the filesize and the name of the file from the response, and then use a watch script to check filesize from downloaded file, in order to close the browser.

This is an example:

const filename = <set this with some regex in response>;
const dir = <watch folder or file>;

// Download and wait for download
    await Promise.all([
        page.click('#DownloadFile'),
       // Event on all responses
        page.on('response', response => {
            // If response has a file on it
            if (response._headers['content-disposition'] === `attachment;filename=${filename}`) {
               // Get the size
                console.log('Size del header: ', response._headers['content-length']);
                // Watch event on download folder or file
                 fs.watchFile(dir, function (curr, prev) {
                   // If current size eq to size from response then close
                    if (parseInt(curr.size) === parseInt(response._headers['content-length'])) {
                        browser.close();
                        this.close();
                    }
                });
            }
        })
    ]);

Even that the way of searching in response can be improved though I hope you'll find this usefull.

I spent hours poring through this thread and Stack Overflow yesterday, trying to figure out how to get Puppeteer to download a csv file by clicking a download link in headless mode in an authenticated session. The accepted answer here didn't work in my case because the download does not trigger targetcreated, and the next answer, for whatever reason, did not retain the authenticated session. This article saved the day. In short, fetch. Hopefully this helps someone else out.

const res = await this.page.evaluate(() =>
{
    return fetch('https://example.com/path/to/file.csv', {
        method: 'GET',
        credentials: 'include'
    }).then(r => r.text());
});

I needed to download a file from behind a login, which was being handled by Puppeteer. targetcreated was not being triggered. In the end I downloaded with request, after copying the cookies over from the Puppeteer instance.

In this case, I'm streaming the file through, but you could just as easily save it.

    res.writeHead(200, {
        "Content-Type": 'application/octet-stream',
        "Content-Disposition": `attachment; filename=secretfile.jpg`
    });
    let cookies = await page.cookies();
    let jar = request.jar();
    for (let cookie of cookies) {
        jar.setCookie(`${cookie.name}=${cookie.value}`, "http://secretsite.com");
    }
    try {
        var response = await request({ url: "http://secretsite.com/secretfile.jpg", jar }).pipe(res);
    } catch(err) {
        console.trace(err);
        return res.send({ status: "error", message: err });
    }

I have another solution to this problem, since none of the answers here worked for me.

I needed to log into a website, and download some .csv reports. Headed was fine, headless failed no matter what I tried. Looking at the Network errors, the download is aborted, but I couldn't (quickly) determine why.

So, I intercepted the requests and used node-fetch to make the request outside of puppeteer. This required copying the fetch options, body, headers and adding in the access cookie.

Good luck.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!