How to make the Apify Crawler to scroll full page when web page have infinite scrolling?

妖精的绣舞 提交于 2021-02-19 16:05:14

问题


I'm facing a problem that I unable to get all the product data as the website using a lazy load on product catalog page. meaning it needs to scroll until the whole page loaded.

I getting only first-page products data.


回答1:


First, you should keep in mind that there are infinite ways that infinite scroll can be implemented. Sometimes you have to click buttons on the way or do any sort of transitions. I will cover only the most simple use-case here which is scrolling down with some interval and finishing when no new products are loaded.

  1. If you build your own actor using Apify SDK, you can use infiniteScroll helper utility function. If it doesn't cover your use-case, ideally please give us feedback on Github.

  2. If you are using generic Scrapers (Web Scraper or Puppeteer Scraper), the infinite scroll functionality is not currently built-in (but maybe if you read this in the future). On the other hand, it is not that complicated to implement it yourself, let me show you a simple solution for Web Scraper's pageFunction.

async function pageFunction(context) {
    // few utilities
    const { request, log, jQuery } = context;
    const $ = jQuery;

    // Here we define the infinite scroll function, it has to be defined inside pageFunction
    const infiniteScroll = async (maxTime) => {
        const startedAt = Date.now();
        let itemCount = $('.my-class').length; // Update the selector
        while (true) {
            log.info(`INFINITE SCROLL --- ${itemCount} items loaded --- ${request.url}`)
            // timeout to prevent infinite loop
            if (Date.now() - startedAt > maxTime) {
                return;
            }
            scrollBy(0, 9999);
            await context.waitFor(5000); // This can be any number that works for your website
            const currentItemCount = $('.my-class').length; // Update the selector

            // We check if the number of items changed after the scroll, if not we finish
            if (itemCount === currentItemCount) {
                return;
            }
            itemCount = currentItemCount;
        }
    }

    // Generally, you want to do the scrolling only on the category type page
    if (request.userData.label === 'CATEGORY') {
        await infiniteScroll(60000); // Let's try 60 seconds max

        // ... Add your logic for categories
    } else {
        // Any logic for other types of pages
    }
}

Of course, this is a really trivial example. Sometimes it can get much more complicated. I even once used Puppeteer to navigate my mouse directly and drag some scroll bar that was accessible programmatically.



来源:https://stackoverflow.com/questions/57291169/how-to-make-the-apify-crawler-to-scroll-full-page-when-web-page-have-infinite-sc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!