tutorial // Apr 19, 2021

How to Convert HTML to an Image Using Puppeteer in Node.js

How to set up Puppeteer inside of Node.js to generate images on the fly using HTML and CSS and how to write the generated images to disk and Amazon S3.

Getting started

For this tutorial, we're going to use the CheatCode Node.js Boilerplate as a starting point. This will give us a solid foundation to build on without the need for a lot of custom code.

To get started, clone the boilerplate from Github:

Terminal

git clone https://github.com/cheatcode/nodejs-server-boilerplate.git

And then, cd into the directory and install the dependencies:

Terminal

cd nodejs-server-boilerplate && npm install

Next, install the puppeteer package:

Terminal

npm i puppeteer

Finally, once all of the dependencies are installed, start the server with:

Terminal

npm run dev

With all of this complete, our first step will be to set up a route where we'll display our image for testing.

Adding a route on the server for testing

Inside of the cloned project, open up the /api/index.js file from the root of the project:

/api/index.js

import graphql from "./graphql/server";

export default (app) => {
  graphql(app);

  // We'll add our test route here.
};

Here, app represents the Express.js app instance set up for us in the boilerplate in /index.js. We'll use this to create our test route:

/api/index.js

import graphql from "./graphql/server";

export default (app) => {
  graphql(app);

  app.use("/graphic", (req, res) => {
    res.send("Testing 123");
  });
};

Easy peasy. To test it out, with your server running, open up your browser and head to http://localhost:5001/graphic and you should see "Testing 123" displayed.

Wiring up the image generator using Puppeteer

Next, we need to wire up our image generation. To do it, we're going to create a separate module that we can import wherever we'd like to convert HTML to an image in our app:

/lib/htmlToImage.js

import puppeteer from "puppeteer";

export default async (html = "") => {
 // We'll handle our image generation here.
};

To start, we import puppeteer from the package we installed earlier. Next, we set up our htmlToImage() function, taking in a single html string as an argument.

/lib/htmlToImage.js

import puppeteer from "puppeteer";

export default async (html = "") => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
};

First, we need to create a Puppeteer instance. To do it, we use puppeteer.launch(). Notice that here, we're using the JavaScript async/await syntax because we expect puppeteer.launch() to return us a Promise. By using the await keyword here, we're telling JavaScript—and by extension, Node.js—to wait until it's received a response from puppeteer.launch().

Next, with our browser created, we create a page with browser.newPage() (think of this like opening a tab in your own browser, but in a "headless" state, meaning there's no user interface—the browser only exist in memory). Again, we anticipate a Promise being returned, so we await this call before moving on.

/lib/htmlToImage.js

import puppeteer from "puppeteer";

export default async (html = "") => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setContent(html);

  const content = await page.$("body");
  const imageBuffer = await content.screenshot({ omitBackground: true });
};

Next, we get into the important part. Here, using page.setContent() we tell Puppeteer to populate the browser page with the html string we passed into our function as an argument. This is equivalent to you loading a website in your browser and the HTML from the server's response being loaded into memory.

Next, we use Puppeteer's built-in DOM (document object model) API to access the in-memory browser's HTML. Here, in our content variable, we store the result of calling await page.$("body");. What this is doing is taking the in-memory rendered version of our HTML and extracting the contents of the <body></body> tag (our rendered HTML).

In response, we get back a Puppeteer ElementHandle which is a way of saying "the element as it's represented in-memory by Puppeteer," or, our rendered HTML as a Puppeteer-friendly object.

Next, using that content, we utilize the Puppeteer .screenshot() method to take a screenshot of our in-memory rendered HTML page. To give full control of what's rendered in our image, we pass omitBackground to true to ensure we make the page background completely transparent.

In response, we expect to get back an imageBuffer. This is the raw image file content, but not the actual image itself (meaning you'll see a bunch of random binary data, not an image). Before we see how to get our actual image, we need to do some cleanup:

/lib/htmlToImage.js

import puppeteer from "puppeteer";

export default async (html = "") => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setContent(html);

  const content = await page.$("body");
  const imageBuffer = await content.screenshot({ omitBackground: true });

  await page.close();
  await browser.close();

  return imageBuffer;
};

Here, we've added two calls: page.close() and browser.close(). Predictably, these close the page (or browser tab) that we opened in memory as well as the browser. This is very important to do because if you don't you end up leaving unclosed browsers in memory which depletes your server's resources (and can cause a potential crash due to memory overflow).

Finally, we return our retrieved imageBuffer from the function.

Rendering the image on our route

One more step. Technically, at this point, we haven't passed any HTML to our function. Let's import htmlToImage() back into our /api/index.js file and call it from our route:

/api/index.js

import graphql from "./graphql/server";
import htmlToImage from "../lib/htmlToImage";

export default (app) => {
  graphql(app);

  app.use("/graphic", async (req, res) => {
    const imageBuffer = await htmlToImage(`<!-- Our HTML will go here. -->`);

    res.set("Content-Type", "image/png");
    res.send(imageBuffer);
  });
};

Here, we've imported our htmlToImage function from /lib/htmlToImage. On the callback for our route, we've added the async flag because, now, we're using the await keyword before our htmlToImage() function. Remember, this is necessary because we need to wait on Puppeteer to do its work before we can rely on it returning data to us.

In addition to our call, we've also modified how we respond to the route request. Here, we've added a call to res.set(), setting the Content-Type header to image/png. Remember how we mentioned that the imageBuffer we were receiving from content.screenshot() wasn't technically an image yet? This is what changes that. Here, image/png is known as a MIME type; a data type recognized by browsers that says "the raw data I'm giving you should be rendered as ___." In this case, we're saying "render this raw data as a .png image."

Finally, as the response body for our request, we pass imageBuffer to res.send(). With this, now, let's add some HTML into the mix and then give this a test:

/api/index.js

import graphql from "./graphql/server";
import htmlToImage from "../lib/htmlToImage";

export default (app) => {
  graphql(app);

  app.use("/graphic", async (req, res) => {
    const imageBuffer = await htmlToImage(`
      <html>
        <head>
          <style>
            * {
              margin: 0;
              padding: 0;
            }

            *,
            *:before,
            *:after {
              box-sizing: border-box;
            }

            html,
            body {
              background: #0099ff;
              width: 1200px;
              height: 628px;
              font-family: "Helvetica Neue", "Helvetica", "Arial", sans-serif;
            }

            div {
              width: 1200px;
              height: 628px;
              padding: 0 200px;
              display: flex;
              align-items: center;
              justify-content: center;
            }
            
            h1 {
              font-size: 48px;
              line-height: 56px;
              color: #fff;
              margin: 0;
              text-align: center;
            }
          </style>
        </head>
        <body>
          <div>
            <h1>How to Convert HTML to an Image Using Puppeteer in Node.js</h1>
          </div>
        </body>
      </html>
    `);

    res.set("Content-Type", "image/png");
    res.send(imageBuffer);
  });
};

Here, we're passing a plain JavaScript string containing some HTML. We've setup a basic HTML boilerplate consisting of an <html></html> tag populated with a <head></head> tag and a <body></body> tag. In the <head></head> tag, we've added a <style></style> tag containing some CSS to style our HTML content.

In the <body></body>, we've added some simple HTML: a <div></div> tag populated with an <h1></h1> tag. Now, if we head back to our test route at http://localhost:5001/graphic and you should see something like this:

olQdCtaYoVAkNKYk/e6dJ2kvNTyBaScM8.0 — Our image rendered in the browser. Right-click and save as to download it to your computer.

Cool, right? If you right-click on the image and download it, you'll be able to open it up on your computer like any other image.

Before we wrap up, it's good to understand how to store this data permanently instead of just rendering it in the browser and downloading it by hand. Next, we're going to look at two methods: saving the generated image to disk and saving the generated image to Amazon S3.

Writing the generated image to disk

Fortunately, writing our file to disk is pretty simple. Let's make a slight modification to our route (we'll still use the URL in the browser to "trigger" the generation):

/api/index.js

import fs from "fs";
import graphql from "./graphql/server";
import htmlToImage from "../lib/htmlToImage";

export default (app) => {
  graphql(app);

  app.use("/graphic", async (req, res) => {
    const imageBuffer = await htmlToImage(`
      <html>
        [...]
      </html>
    `);

    fs.writeFileSync("./image.png", imageBuffer);

    res.set("Content-Type", "image/png");
    res.send(imageBuffer);
  });
};

Quite simplistic. Here, all we've done is imported fs (the file system in Node.js—fs does not need to be installed), and then added a call to fs.writeFileSync(), passing the path we want our file to be stored at (in this case, in a file called image.png at the root of our project) and the data for the file.

Of note, notice that for the file-extension we've explicitly set image/png. Similar to what we saw rendering our image directly to our route, that .png communicates back to the computer that the contents of this file represent an image in a .png format.

Now, when we visit our route, our file will be written to /image.png on disk as well as rendered in the browser.

Sending the generated image to Amazon S3

Before we go any further, in order to access Amazon S3 we need to add a new dependency: aws-sdk. Let's install that now:

Terminal

npm i aws-sdk

Next, albeit similar, sending our generated image to Amazon S3 is a little more complicated. To do it, we're going to create a new file at /lib/s3.js to implement some code to help us connect to Amazon S3 and write our file (known as "putting an object into the bucket").

/lib/s3.js

import AWS from "aws-sdk";

AWS.config = new AWS.Config({
  accessKeyId: "<Your Access Key ID Here>",
  secretAccessKey: "<Your Secret Access Key Here>",
  region: "us-east-1",
});

// We'll write the S3 code for writing files here.

Here, we import the AWS from the aws-sdk we just installed. Next, we set AWS.config equal to a new instance of AWS.Config (notice the difference between the names is the capital "C"), passing in the credentials we want to use for communicating with AWS.

If you don't already have the necessary credentials, you'll want to read this tutorial by Amazon on how to create a new user. For this example, when creating your user, make sure to enable "Programmatic Access" in step one and attach the AmazonS3FullAccess policy under "Attach existing policies directly" in step two.

Once you've generated your Access Key ID and your Secret Access Key, you can populate the fields above.

Fair Warning: DO NOT commit these keys to a public Github repo. There are bots on Github that scan for unprotected AWS keys and use them to spin up bot farms and perform illegal activity (while making you foot the bill).

For region, you will want to specify the region you create your Amazon S3 bucket in. The region is the geographic location of your bucket on the internet. If you haven't created a bucket yet, you'll want to read this tutorial by Amazon on how to create a new bucket.

When setting up your bucket, for this tutorial, make sure to uncheck "Block public access." This is a good setting for production environments, but since we're just playing around, unchecking it is safe. Fair Warning: DO NOT store any sensitive data in this bucket.

/lib/s3.js

import AWS from "aws-sdk";

AWS.config = new AWS.Config({
  accessKeyId: "<Your Access Key ID Here>",
  secretAccessKey: "<Your Secret Access Key Here>",
  region: "us-east-1",
});

const s3 = new AWS.S3();

export default {
  putObject(options = {}) {
    return new Promise((resolve, reject) => {
      s3.putObject(
        {
          Bucket: options.bucket,
          ACL: options.acl || "public-read",
          Key: options.key,
          Body: options.body,
          ContentType: options.contentType,
        },
        (error, response) => {
          if (error) {
            console.warn("[s3] Upload Error: ", error);
            reject(error);
          } else {
            resolve({
              url: `https://${options.bucket}.s3.amazonaws.com/${options.key}`,
              name: options.key,
              type: options.contentType || "application/",
            });
          }
        }
      );
    });
  },
};

Once we've configured our AWS IAM user and bucket region, next, we want to create an instance of s3 by calling to new AWS.S3().

Thinking ahead we want to anticipate the need for other S3 methods later, so instead of just exporting a single function from our file, here, we export an object with a putObject method.

For that method (the name for a function defined as part of an object), we anticipate an options object to be passed containing the data and instructions for how to handle our file. In the body of this function, we return a Promise so that we can wrap the asynchronous s3.putObject() method from the aws-sdk package.

When we call that method, we pass in the options per the Amazon S3 SDK documentation, describing our file, where we want it to live, and the permissions to associate with it. In the callback method for s3.putObject(), assuming we don't have an error, we construct an object describing the location of our new file on Amazon S3 and resolve() the Promise we've returned from the function.

/api/index.js

import fs from "fs";
import graphql from "./graphql/server";
import htmlToImage from "../lib/htmlToImage";
import s3 from "../lib/s3";

export default (app) => {
  graphql(app);

  app.use("/graphic", async (req, res) => {
    const imageBuffer = await htmlToImage(`
      <html>
        [...]
      </html>
    `);

    fs.writeFileSync("./image.png", imageBuffer);

    const s3File = await s3.putObject({
      bucket: "<Your Bucket Name Here>",
      key: `generated-image.png`,
      body: imageBuffer,
      contentType: "image/png",
    });

    console.log(s3File);

    res.set("Content-Type", "image/png");
    res.send(imageBuffer);
  });
};

Back in our /api/index.js file, now we're ready to upload to S3. Modifying our code from earlier slightly, we import our s3 file from /lib/s3.js at the top and then in the body of our route's callback, we add our call to s3.putObject(), passing in the bucket we want our file to be stored in, the key (path and file name relative to the root of our bucket) for our file, the body (raw imageBuffer data), and the contentType (the same image/png MIME type we discussed earlier).

Finally, we make sure to await our call to S3 to ensure that we get our file back. In your own app, this may not be necessary if you're okay with the file being uploaded in the background.

That's it! Now, if we visit http://localhost:5001/graphic in our app, we should see our graphic uploaded to Amazon S3, followed by the confirmation being logged out in the terminal:

Terminal

{
  url: 'https://cheatcode-tutorials.s3.amazonaws.com/generated-image.png',
  name: 'generated-image.png',
  type: 'image/png'
}

Wrapping Up

In this tutorial, we learned how to generate an image from HTML and CSS using Puppeteer. We learned how to spin up a browser in memory, pass it some HTML, and then take a screenshot of that rendered page using Puppeteer. We also learned how to return our image to a browser directly as well as how to store that file on disk using the Node.js file system and upload our image to Amazon S3 using the AWS JavaScript SDK.

Written By

Ryan Glover

CEO/CTO @ CheatCode

Joystick

Push

Mod

tutorial // Apr 19, 2021

How to Convert HTML to an Image Using Puppeteer in Node.js

Getting started

Adding a route on the server for testing

Wiring up the image generator using Puppeteer

Rendering the image on our route

Writing the generated image to disk

Sending the generated image to Amazon S3

Wrapping Up

Written By

Ryan Glover