Puppeteer: Grabbing entire html from page that uses lazy load

隐身守侯 提交于 2019-12-08 03:35:30

I am not well good in this but After searching so long i found one solution gives good results for one my requirement. Here is the piece of code i used to handle lazy-load scenarios.

const bodyHandle = await page.$('body');
const { height } = await bodyHandle.boundingBox();
await bodyHandle.dispose();
console.log('Handling viewport...')
const viewportHeight = page.viewport().height;
let viewportIncr = 0;
while (viewportIncr + viewportHeight < height) {
await page.evaluate(_viewportHeight => {
window.scrollBy(0, _viewportHeight);
}, viewportHeight);
await wait(30);
viewportIncr = viewportIncr + viewportHeight;
}
console.log('Handling Scroll operations')
await page.evaluate(_ => {
window.scrollTo(0, 0);
});
await wait(100);  
await page.screenshot({path: 'GoogleHome.jpg', fullPage: true});

From this am able to take long screenshots even. Hope this will help you.

The problem is that the linked page is using the library react-virtualized. This library only renders the visible part of the website. Therefore you cannot get the whole table at once. Crawling to the bottom of the table will only put the bottom part of the table in the DOM.

To check where the page loads its content from, you should check the network tab of the DevTools. You will notice that the content of the page is loaded from this URL, which seems to provide a perfect representation of the DOM in JSON format. So, there is really no need to scrape that data from the page. You can just use the URL.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!