Handling Lazy-Loaded Webpages in Puppeteer

Modern websites often delay loading images and other expensive content until the user has scrolled down to it. This technique–commonly called lazy-loading–speeds up the experience for users and reduces bandwidth costs for the website owners. However, in a browser auotmation scenario (like generating webpage screenshots) this can cause problems as there is no "user" manually scrolling down the page.

So how can we ensure that automated webpage screenshots reliably load all content on the page? If you're using Screenshot Bin, we've got you covered–any request for a full-page screenshot will automatically lazy-load all content. But if you're writing your own browser automation scripts using a tool like headless chrome, Puppeteer, PhantomJS, etc, you'll need some clever code to handle lazy-loading.

Here's what this looks like when using Puppeteer to drive headless Chrome:

function wait (ms) {
  return new Promise(resolve => setTimeout(() => resolve(), ms));
}

export default async function capture(browser, url) {
  // Load the specified page
  const page = await browser.newPage();
  await page.goto(url, {waitUntil: 'load'});

  // Get the height of the rendered page
  const bodyHandle = await page.$('body');
  const { height } = await bodyHandle.boundingBox();
  await bodyHandle.dispose();

  // Scroll one viewport at a time, pausing to let content load
  const viewportHeight = page.viewport().height;
  let viewportIncr = 0;
  while (viewportIncr + viewportHeight < height) {
    await page.evaluate(_viewportHeight => {
      window.scrollBy(0, _viewportHeight);
    }, viewportHeight);
    await wait(20);
    viewportIncr = viewportIncr + viewportHeight;
  }

  // Scroll back to top
  await page.evaluate(_ => {
    window.scrollTo(0, 0);
  });

  // Some extra delay to let images load
  await wait(100);

  return await page.screenshot({type: 'png'});
}

After the page finishes loading we use page.evaluate to execute window.scrollBy within the page context and scroll down the page one viewport at a time, triggering all the content to be lazy loaded.

It's worth pointing out that we limit our scrolling to the height of the page at the time of page load–this is to ensure that we don't encounter an infinite loop if we happen to be capturing page with infinite scroll, such as a news feed.

This is the same technique we use in Screenshot Bin's webpage screenshot API, and it's worked well for us. Whether you're using Screenshot Bin or Puppeteer directly, we hope this strategy helps you reliably capture lazy loaded content!