Video processing with WebCodecs

Manipulating video stream components.

Eugene Zemtsov
Eugene Zemtsov
François Beaufort
François Beaufort

Modern web technologies provide ample ways to work with video. Media Stream API, Media Recording API, Media Source API, and WebRTC API add up to a rich tool set for recording, transferring, and playing video streams. While solving certain high-level tasks, these APIs don't let web programmers work with individual components of a video stream such as frames and unmuxed chunks of encoded video or audio. To get low-level access to these basic components, developers have been using WebAssembly to bring video and audio codecs into the browser. But given that modern browsers already ship with a variety of codecs (which are often accelerated by hardware), repackaging them as WebAssembly seems like a waste of human and computer resources.

WebCodecs API eliminates this inefficiency by giving programmers a way to use media components that are already present in the browser. Specifically:

  • Video and audio decoders
  • Video and audio encoders
  • Raw video frames
  • Image decoders

The WebCodecs API is useful for web applications that require full control over the way media content is processed, such as video editors, video conferencing, video streaming, etc.

Video processing workflow

Frames are the centerpiece in video processing. Thus in WebCodecs most classes either consume or produce frames. Video encoders convert frames into encoded chunks. Video decoders do the opposite.

Also VideoFrame plays nicely with other Web APIs by being a CanvasImageSource and having a constructor that accepts CanvasImageSource. So it can be used in functions like drawImage() andtexImage2D(). Also it can be constructed from canvases, bitmaps, video elements and other video frames.

WebCodecs API works well in tandem with the classes from Insertable Streams API which connect WebCodecs to media stream tracks.

  • MediaStreamTrackProcessor breaks media tracks into individual frames.
  • MediaStreamTrackGenerator creates a media track from a stream of frames.

WebCodecs and web workers

By design WebCodecs API does all the heavy lifting asynchronously and off the main thread. But since frame and chunk callbacks can often be called multiple times a second, they might clutter the main thread and thus make the website less responsive. Therefore it is preferable to move handling of individual frames and encoded chunks into a web worker.

To help with that, ReadableStream provides a convenient way to automatically transfer all frames coming from a media track to the worker. For example, MediaStreamTrackProcessor can be used to obtain a ReadableStream for a media stream track coming from the web camera. After that the stream is transferred to a web worker where frames are read one by one and queued into a VideoEncoder.

With HTMLCanvasElement.transferControlToOffscreen even rendering can be done off the main thread. But if all the high level tools turned out to be inconvenient, VideoFrame itself is transferable and may be moved between workers.

WebCodecs in action

Encoding

The path from a Canvas or an ImageBitmap to the network or to storage
The path from a Canvas or an ImageBitmap to the network or to storage

It all starts with a VideoFrame. There are three ways to construct video frames.

  • From an image source like a canvas, an image bitmap, or a video element.

    const canvas = document.createElement("canvas");
    // Draw something on the canvas...
    
    const frameFromCanvas = new VideoFrame(canvas, { timestamp: 0 });
    
  • Use MediaStreamTrackProcessor to pull frames from a MediaStreamTrack

    const stream = await navigator.mediaDevices.getUserMedia({…});
    const track = stream.getTracks()[0];
    
    const trackProcessor = new MediaStreamTrackProcessor(track);
    
    const reader = trackProcessor.readable.getReader();
    while (true) {
      const result = await reader.read();
      if (result.done) break;
      const frameFromCamera = result.value;
    }
    
  • Create a frame from its binary pixel representation in a BufferSource

    const pixelSize = 4;
    const init = {
      timestamp: 0,
      codedWidth: 320,
      codedHeight: 200,
      format: "RGBA",
    };
    const data = new Uint8Array(init.codedWidth * init.codedHeight * pixelSize);
    for (let x = 0; x < init.codedWidth; x++) {
      for (let y = 0; y < init.codedHeight; y++) {
        const offset = (y * init.codedWidth + x) * pixelSize;
        data[offset] = 0x7f;      // Red
        data[offset + 1] = 0xff;  // Green
        data[offset + 2] = 0xd4;  // Blue
        data[offset + 3] = 0x0ff; // Alpha
      }
    }
    const frame = new VideoFrame(data, init);
    

No matter where they are coming from, frames can be encoded into EncodedVideoChunk objects with a VideoEncoder.

Before encoding, VideoEncoder needs to be given two JavaScript objects:

  • Init dictionary with two functions for handling encoded chunks and errors. These functions are developer-defined and can't be changed after they're passed to the VideoEncoder constructor.
  • Encoder configuration object, which contains parameters for the output video stream. You can change these parameters later by calling configure().

The configure() method will throw NotSupportedError if the config is not supported by the browser. You are encouraged to call the static method VideoEncoder.isConfigSupported() with the config to check beforehand whether the config is supported and wait for its promise.

const init = {
  output: handleChunk,
  error: (e) => {
    console.log(e.message);
  },
};

const config = {
  codec: "vp8",
  width: 640,
  height: 480,
  bitrate: 2_000_000, // 2 Mbps
  framerate: 30,
};

const { supported } = await VideoEncoder.isConfigSupported(config);
if (supported) {
  const encoder = new VideoEncoder(init);
  encoder.configure(config);
} else {
  // Try another config.
}

After the encoder has been set up, it's ready to accept frames via encode() method. Both configure() and encode() return immediately without waiting for the actual work to complete. It allows several frames to queue for encoding at the same time, while encodeQueueSize shows how many requests are waiting in the queue for previous encodes to finish. Errors are reported either by immediately throwing an exception, in case the arguments or the order of method calls violates the API contract, or by calling the error() callback for problems encountered in the codec implementation. If encoding completes successfully the output() callback is called with a new encoded chunk as an argument. Another important detail here is that frames need to be told when they are no longer needed by calling close().

let frameCounter = 0;

const track = stream.getVideoTracks()[0];
const trackProcessor = new MediaStreamTrackProcessor(track);

const reader = trackProcessor.readable.getReader();
while (true) {
  const result = await reader.read();
  if (result.done) break;

  const frame = result.value;
  if (encoder.encodeQueueSize > 2) {
    // Too many frames in flight, encoder is overwhelmed
    // let's drop this frame.
    frame.close();
  } else {
    frameCounter++;
    const keyFrame = frameCounter % 150 == 0;
    encoder.encode(frame, { keyFrame });
    frame.close();
  }
}

Finally it's time to finish encoding code by writing a function that handles chunks of encoded video as they come out of the encoder. Usually this function would be sending data chunks over the network or muxing them into a media container for storage.

function handleChunk(chunk, metadata) {
  if (metadata.decoderConfig) {
    // Decoder needs to be configured (or reconfigured) with new parameters
    // when metadata has a new decoderConfig.
    // Usually it happens in the beginning or when the encoder has a new
    // codec specific binary configuration. (VideoDecoderConfig.description).
    fetch("/upload_extra_data", {
      method: "POST",
      headers: { "Content-Type": "application/octet-stream" },
      body: metadata.decoderConfig.description,
    });
  }

  // actual bytes of encoded data
  const chunkData = new Uint8Array(chunk.byteLength);
  chunk.copyTo(chunkData);

  fetch(`/upload_chunk?timestamp=${chunk.timestamp}&type=${chunk.type}`, {
    method: "POST",
    headers: { "Content-Type": "application/octet-stream" },
    body: chunkData,
  });
}

If at some point you'd need to make sure that all pending encoding requests have been completed, you can call flush() and wait for its promise.

await encoder.flush();

Decoding

The path from the network or storage to a Canvas or an ImageBitmap.
The path from the network or storage to a Canvas or an ImageBitmap.

Setting up a VideoDecoder is similar to what's been done for the VideoEncoder: two functions are passed when the decoder is created, and codec parameters are given to configure().

The set of codec parameters varies from codec to codec. For example H.264 codec might need a binary blob of AVCC, unless it's encoded in so called Annex B format (encoderConfig.avc = { format: "annexb" }).

const init = {
  output: handleFrame,
  error: (e) => {
    console.log(e.message);
  },
};

const config = {
  codec: "vp8",
  codedWidth: 640,
  codedHeight: 480,
};

const { supported } = await VideoDecoder.isConfigSupported(config);
if (supported) {
  const decoder = new VideoDecoder(init);
  decoder.configure(config);
} else {
  // Try another config.
}

Once the decoder is initialized, you can start feeding it with EncodedVideoChunk objects. To create a chunk, you'll need:

  • A BufferSource of encoded video data
  • the chunk's start timestamp in microseconds (media time of the first encoded frame in the chunk)
  • the chunk's type, one of:
    • key if the chunk can be decoded independently from previous chunks
    • delta if the chunk can only be decoded after one or more previous chunks have been decoded

Also any chunks emitted by the encoder are ready for the decoder as is. All of the things said above about error reporting and the asynchronous nature of encoder's methods are equally true for decoders as well.

const responses = await downloadVideoChunksFromServer(timestamp);
for (let i = 0; i < responses.length; i++) {
  const chunk = new EncodedVideoChunk({
    timestamp: responses[i].timestamp,
    type: responses[i].key ? "key" : "delta",
    data: new Uint8Array(responses[i].body),
  });
  decoder.decode(chunk);
}
await decoder.flush();

Now it's time to show how a freshly decoded frame can be shown on the page. It's better to make sure that the decoder output callback (handleFrame()) quickly returns. In the example below, it only adds a frame to the queue of frames ready for rendering. Rendering happens separately, and consists of two steps:

  1. Waiting for the right time to show the frame.
  2. Drawing the frame on the canvas.

Once a frame is no longer needed, call close() to release underlying memory before the garbage collector gets to it, this will reduce the average amount of memory used by the web application.

const canvas = document.getElementById("canvas");
const ctx = canvas.getContext("2d");
let pendingFrames = [];
let underflow = true;
let baseTime = 0;

function handleFrame(frame) {
  pendingFrames.push(frame);
  if (underflow) setTimeout(renderFrame, 0);
}

function calculateTimeUntilNextFrame(timestamp) {
  if (baseTime == 0) baseTime = performance.now();
  let mediaTime = performance.now() - baseTime;
  return Math.max(0, timestamp / 1000 - mediaTime);
}

async function renderFrame() {
  underflow = pendingFrames.length == 0;
  if (underflow) return;

  const frame = pendingFrames.shift();

  // Based on the frame's timestamp calculate how much of real time waiting
  // is needed before showing the next frame.
  const timeUntilNextFrame = calculateTimeUntilNextFrame(frame.timestamp);
  await new Promise((r) => {
    setTimeout(r, timeUntilNextFrame);
  });
  ctx.drawImage(frame, 0, 0);
  frame.close();

  // Immediately schedule rendering of the next frame
  setTimeout(renderFrame, 0);
}

Dev Tips

Use the Media Panel in Chrome DevTools to view media logs and debug WebCodecs.

Screenshot of the Media Panel for debugging WebCodecs
Media Panel in Chrome DevTools for debugging WebCodecs.

Demo

The demo below shows how animation frames from a canvas are:

  • captured at 25fps into a ReadableStream by MediaStreamTrackProcessor
  • transferred to a web worker
  • encoded into H.264 video format
  • decoded again into a sequence of video frames
  • and rendered on the second canvas using transferControlToOffscreen()

Other demos

Also check out our other demos:

Using the WebCodecs API

Feature detection

To check for WebCodecs support:

if ('VideoEncoder' in window) {
  // WebCodecs API is supported.
}

Keep in mind that WebCodecs API is only available in secure contexts, so detection will fail if self.isSecureContext is false.

Feedback

The Chrome team wants to hear about your experiences with the WebCodecs API.

Tell us about the API design

Is there something about the API that doesn't work like you expected? Or are there missing methods or properties that you need to implement your idea? Have a question or comment on the security model? File a spec issue on the corresponding GitHub repo, or add your thoughts to an existing issue.

Report a problem with the implementation

Did you find a bug with Chrome's implementation? Or is the implementation different from the spec? File a bug at new.crbug.com. Be sure to include as much detail as you can, simple instructions for reproducing, and enter Blink>Media>WebCodecs in the Components box. Glitch works great for sharing quick and easy repros.

Show support for the API

Are you planning to use the WebCodecs API? Your public support helps the Chrome team to prioritize features and shows other browser vendors how critical it is to support them.

Send emails to media-dev@chromium.org or send a tweet to @ChromiumDev using the hashtag #WebCodecs and let us know where and how you're using it.

Hero image by Denise Jans on Unsplash.