Understanding Worker Threads in Node.js - NodeSource

The NodeSource Blog

You have reached the beginning of time!

Understanding Worker Threads in Node.js

To understand Workers, first, it’s necessary to understand how Node.js is structured.

When a Node.js process is launched, it runs:

  • One process
  • One thread
  • One event loop
  • One JS Engine Instance
  • One Node.js Instance

One process: a process is a global object that can be accessed anywhere and has information about what’s being executed at a time.

One thread: being single-threaded means that only one set of instructions is executed at a time in a given process.

One event loop: this is one of the most important aspects to understand about Node. It’s what allows Node to be asynchronous and have non-blocking I/O, — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible through callbacks, promises and async/await.

One JS Engine Instance: this is a computer program that executes JavaScript code.

One Node.js Instance: the computer program that executes Node.js code.

In other words, Node runs on a single thread, and there is just one process happening at a time in the event loop. One code, one execution, (the code is not executed in parallel). This is very useful because it simplifies how you use JavaScript without worrying about concurrency issues.

The reason it was built with that approach is that JavaScript was initially created for client-side interactions (like web page interactions, or form validation) -- nothing that required the complexity of multithreading.

But, as with all things, there is a downside: if you have CPU-intensive code, like complex calculations in a large dataset taking place in-memory, it can block other processes from being executed. Similarly, If you are making a request to a server that has CPU-intensive code, that code can block the event loop and prevent other requests of being handled.

A function is considered “blocking” if the main event loop must wait until it has finished executing the next command. A “Non-blocking” function will allow the main event loop to continue as soon as it begins and typically alerts the main loop once it has finished by calling a “callback”.

The golden rule: don’t block the event loop, try to keep it running it and pay attention and avoid anything that could block the thread like synchronous network calls or infinite loops.

It’s important to differentiate between CPU operations and I/O (input/output) operations. As mentioned earlier, the code of Node.js is NOT executed in parallel. Only I/O operations are run in parallel, because they are executed asynchronously.

So Worker Threads will not help much with I/O-intensive work because asynchronous I/O operations are more efficient than Workers can be. The main goal of Workers is to improve the performance on CPU-intensive operations not I/O operations.

Some solutions

Furthermore, there are already solutions for CPU intensive operations: multiple processes (like cluster API) that make sure that the CPU is optimally used.

This approach is advantageous because it allows isolation of processes, so if something goes wrong in one process, it doesn’t affect the others. They also have stability and identical APIs. However, this means sacrificing shared memory, and the communication of data must be via JSON.

JavaScript and Node.js will never have threads, this is why:

So, people might think that adding a new module in Node.js core will allow us to create and sync threads, thus solving the problem of CPU-intensive operations.

Well, no, not really. If threads are added, the nature of the language itself will change. It’s not possible to add threads as a new set of available classes or functions. In languages that support multithreading (like Java), keywords such as “synchronized” help to enable multiple threads to sync.

Also, some numeric types are not atomic, meaning that if you don’t synchronize them, you could end up having two threads changing the value of a variable and resulting that after both threads have accessed it, the variable has a few bytes changed by one thread and a few bytes changed by the other thread and thus, not resulting in any valid value. For example, in the simple operation of 0.1 + 0.2 has 17 decimals in JavaScript (the maximum number of decimals).

var x = 0.1 + 0.2; // x will be 0.30000000000000004

But floating point arithmetic is not always 100% accurate. So if not synchronized, one decimal may get changed using Workers, resulting in non-identical numbers.

The best solution:

The best solution for CPU performance is Worker Threads. Browsers have had the concept of Workers for a long time.

Instead of having:

  • One process
  • One thread
  • One event loop
  • One JS Engine Instance
  • One Node.js Instance

Worker threads have:

  • One process
  • Multiple threads
  • One event loop per thread
  • One JS Engine Instance per thread
  • One Node.js Instance per thread

As we can see in the following image:

worker-diagram@2x (1)

The worker_threads module enables the use of threads that execute JavaScript in parallel. To access it:

const worker = require('worker_threads');

Worker Threads have been available since Node.js 10, but are still in the experimental phase.

Get started with low-impact performance monitoring Create your NodeSource Account

What is ideal, is to have multiple Node.js instances inside the same process. With Worker threads, a thread can end at some point and it’s not necessarily the end of the parent process. It’s not a good practice for resources that were allocated by a Worker to hang around when the Worker is gone-- that’s a memory leak, and we don’t want that. We want to embed Node.js into itself, give Node.js the ability to create a new thread and then create a new Node.js instance inside that thread; essentially running independent threads inside the same process.

What makes Worker Threads special:

  • ArrayBuffers to transfer memory from one thread to another
  • SharedArrayBuffer that will be accessible from either thread. It lets you share memory between threads (limited to binary data).
  • Atomics available, it lets you do some processes concurrently, more efficiently and allows you to implement conditions variables in JavaScript
  • MessagePort, used for communicating between different threads. It can be used to transfer structured data, memory regions and other MessagePorts between different Workers.
  • MessageChannel represents an asynchronous, two-way communications channel used for communicating between different threads.
  • WorkerData is used to pass startup data. An arbitrary JavaScript value that contains a clone of the data passed to this thread’s Worker constructor. The data is cloned as if using postMessage()

API

  • const { worker, parentPort } = require(‘worker_threads’) => The worker class represents an independent JavaScript execution thread and the parentPort is an instance of the message port
  • new Worker(filename) or new Worker(code, { eval: true }) => are the two main ways of starting a worker (passing the filename or the code that you want to execute). It’s advisable to use the filename in production.
  • worker.on(‘message’), worker/postMessage(data) => for listening to messages and sending them between the different threads.
  • parentPort.on(‘message’), parentPort.postMessage(data) => Messages sent using parentPort.postMessage() will be available in the parent thread using worker.on('message'), and messages sent from the parent thread using worker.postMessage() will be available in this thread using parentPort.on('message').

EXAMPLE:

const { Worker } = require('worker_threads');

const worker = new Worker(`
const { parentPort } = require('worker_threads');
parentPort.once('message',
    message => parentPort.postMessage({ pong: message }));  
`, { eval: true });
worker.on('message', message => console.log(message));      
worker.postMessage('ping');  
$ node --experimental-worker test.js
{ pong: ‘ping’ }

Example by Anna Henningsen

What this essentially does is create a new thread using a new Worker, the code inside the Worker is listening for a message on parentPort and once it receives the message, it is going to post the message back to the main thread.

You have to use the --experimental-worker because Workers are still experimental.

Another example:

    const {
      Worker, isMainThread, parentPort, workerData
    } = require('worker_threads');

    if (isMainThread) {
      module.exports = function parseJSAsync(script) {
        return new Promise((resolve, reject) => {
          const worker = new Worker(filename, {
            workerData: script
          });
          worker.on('message', resolve);
          worker.on('error', reject);
          worker.on('exit', (code) => {
            if (code !== 0)
              reject(new Error(`Worker stopped with exit code ${code}`));
          });
        });
      };
    } else {
      const { parse } = require('some-js-parsing-library');
      const script = workerData;
      parentPort.postMessage(parse(script));
    }

It requires:

  • Worker: the class that represents an independent JavaScript execution thread.
  • isMainThread: a boolean that is true if the code is not running inside of a Worker thread.
  • parentPort: the MessagePort allowing communication with the parent thread If this thread was spawned as a Worker.
  • workerData: An arbitrary JavaScript value that contains a clone of the data passed to this thread’s Worker constructor.

In actual practice for these kinds of tasks, use a pool of Workers instead. Otherwise, the overhead of creating Workers would likely exceed their benefit.

What is expected for Workers (hopefully):

  • Passing native handles around (e.g. sockets, http request)
  • Deadlock detection. Deadlock is a situation where a set of processes are blocked because each process is holding a resource and waiting for another resource acquired by some other process. Deadlock detention will be useful for Worker threads in this case.
  • More isolation, so if one process is affected, it won’t affect others.

What NOT to expect for Workers:

  • Don’t think Workers make everything magically faster, in some cases is better to use Worker pool
  • Don’t use Workers for parallelizing I/O operations.
  • Don’t think spawning Workers is cheap

Final notes:

The contributors to Workers in Node.js are looking for feedback, if you have used Workers before and want to contribute, you can leave your feedback here

Workers have chrome DevTools support to inspect Workers in Node.js.

And worker_threads is a promising experimental module if you need to do CPU-intensive tasks in your Node.js application. Keep in mind that it’s still experimental, so it is advisable to wait before using it in production. For now, you can use Worker pools instead.

References:

Special thanks to Anna Henningsen and her amazing talk of Node.js: The Road to Workers

Node.js API

Node.js multithreading: What are Worker Threads and why do they matter? - by Alberto Gimeno

Introduction to Javascript Processes - by Nico Valencia

The Node.js Event Loop

The NodeSource platform offers a high-definition view of the performance, security and behavior of Node.js applications and functions.

Start for Free