Experimenting with the Streams API

I am always looking for ways to build faster, slicker web pages. Where applicable, I’ll use great new browser features such as service workers, HTTP/2 and improved compression, just to name a few. But what if I told you there was a way to build even faster web pages? I’d like to introduce you to the Streams API.

Before we go any further, let’s go right back to basics. What is a stream? A stream is data that is created, processed, and consumed in an incremental fashion, without ever reading all of it into memory. Streams have been available server side for years, but web streams are available to JavaScript, and you can now start processing raw data with JavaScript bit by bit as soon as it is available on the client-side, without needing to generate a buffer, string, or blob.

The benefit of using streams is that if you are sending large chunks of data over the web, you can start processing the data immediately as you receive it, without having to wait for the full download to complete. For example, if you think of a large video file and imagine how long it might take to download the whole file. Using a stream allows you to download only the small amount of data that you need to view the video instead of the whole file - this means that you can view the video as quickly as the network takes to get you just those bytes.

Streams also come with many other benefits, one of them being that they reduce the amount of memory that a large resource takes up. For example, if we needed to download a large file, process it, and keep it in memory, this could become a problem. However, with streaming, we can reduce the amount of memory that a large resource takes up because we are processing the data piece by piece; this feature is known as flow control and plays an important role in web streams.

At this point, you might be a little skeptical about using streams - but let me do my best to convince you of their great advantages. In this article, I am going to download and process a large JSON file using the Streams API and instantly write the data to a web page as we receive it, instead of when all of the data is downloaded.

What is NDJSON?

Before we go any further, it’s worth diving into a data format called NDJSON that we will be using throughout this example. If you haven’t heard of NDJSON before, it is a type of JSON called Newline delimited JSON. Wait...not another data format!? I totally agree with you, but NDJSON is needed because there is currently no standard for transporting instances of JSON text within a stream protocol. NDJSON looks very similar to JSON, the only difference is that each new line contains a new record, which allows us to stream and process one record at a time. If we sent traditional JSON in a stream, and processed it in chunks, it wouldn’t come through as valid JSON. This is why NDJSON is perfect for this situation.

Let’s look at a simple example comparing the two formats. Imagine the following JSON file:

[
    {"id":1,"name":"Alice"},
    {"id":2,"name":"Bob"},
    {"id":3,"name":"Carol"}
]

The exact same data expressed as NDJSON looks like this:

{"id":1,"name":"Alice"}
{"id":2,"name":"Bob"}
{"id":3,"name":"Carol"}

If you’d like to learn more about NDJSON, I recommend reading the following article.

With this in mind, let’s get started!

Getting Started

In order to get started, we need to get a little acquainted with the Fetch API. If you’ve used it before, you may be aware that the Fetch API exposes Response bodies as ReadableStream instances. This means that they represent a readable stream of byte data. By tapping into this ReadableStream, we are able to process data incrementally in a stream instead of buffering it all into memory and processing it in one go.

We’ll be looking at a concept called “piping” shortly which I think helps describe streams really well. Piping provides a chainable way of piping the current stream through a transform stream or any other writable/readable pair. The great thing about it is that it allows data to flow from one pipe to the next!

Let’s take a look at the following function.

/**
 * Fetch and process the stream
 */
async function process() {
    // Retrieve NDJSON from the server
    const response = await fetch('http://localhost:3000/request');

    const results = response.body
        // From bytes to text:
        .pipeThrough(new TextDecoderStream())
        // Buffer until newlines:
        .pipeThrough(splitStream('\n'))
        // Parse chunks as JSON:
        .pipeThrough(parseJSON());

    // Loop through the results and write to the DOM
    writeToDOM(results.getReader());
}

In the code above, we started off by making a request to a local server for a NDJSON file. The body of the response is available as a readable stream which, as the name implies, allows us to read data out of it.

Using the pipeThrough() method, we can pipe the data we received through to another type of stream (Writeable, Transform, etc.) and process it accordingly. In the code above, I’ve piped the body of the response through to another function called splitStream() and then again to another one called parseJSON(). Firstly, splitStream() takes the result of the NDJSON file and splits it based on each new line - making this the perfect format for streaming! Next, parseJSON() takes each chunk of data and parses the JSON to ensure that it is valid.

Now that our stream is ready to use, we can write the data to the page. With the readableStream, we can use getReader() to create a reader that locks the stream to the new reader. While the stream is locked, no other reader can be acquired until this one is released. This functionality is especially useful for creating abstractions that desire the ability to consume a stream in its entirety. By gett)ing a reader for the stream, you can ensure nobody else can interleave reads with yours or cancel the stream, which would interfere with your abstraction.

We can then iterate through the results in the reader.

/**
 * Read through the results and write to the DOM
 * @param {object} reader 
 */
function writeToDOM(reader) {
    reader.read().then(
        ({ value, done }) => {
            if (done) {
                console.log("The stream was already closed!");

            } else {
                // Build up the values
                let result = document.createElement('div');
                result.innerHTML = `<div>ID: ${value.id} - Phone: ${value.phone} - Result: ${value.result}</div><br>`;

                // Prepend to the target
                targetDiv.insertBefore(result, targetDiv.firstChild);

                // Recursively call
                writeToDOM(reader);
            }
        },
        e => console.error("The stream became errored and cannot be read from!", e)
    );
}

In the code above, I am iterating through each result in the reader and prepending the results to a DIV on the page. In a real world example, you might want to display your results differently, but this gives you an example of what is capable with streams!

Using the Developer Tools in Google Chrome, let’s inspect the results with streaming in place.

In the animation above, you can see that as I reload the page, the results are instantly displayed thanks to the streaming. Even though the rest of the HTTP request is still being downloaded and chunked, we are still able to process and display the results. Without streaming, we would have to wait the full 8 seconds for the file to download, and then display the results. This is much smoother!

Browser Support

Can I Use streams? Data on support for the streams feature across the major browsers from caniuse.com.

Summary

I hope I’ve managed to convince you how awesome streams are on the web. There are some great resources online, that I thoroughly recommend reading. It’s worth mentioning that you wouldn’t want to use streams in every situation, but where applicable they can make a big difference to the performance of your web application.

Web streams allow you to stream data to your users, allowing the browser to process data piece by piece as it is downloaded. Without streaming, we need to wait for the entire contents of a download to complete before we return a response, but by streaming the data instead, we can return the results of the download and process it piece by piece, allowing us to render something onto the screen even sooner. Faster web pages = happier users!

If you’d like to see a working example of this code, please over to Github to find out more.