Sysleaf

What is NodeJS Cluster? an way to scale nodejs application

. .
What is NodeJS Cluster? an way to scale nodejs application

Thread & Process are generally two way to scale the computing and load of the underlying application. The thread is lighter, faster, & memory efficient than process. The thread-based scaling application creates multiple threads to handle the increased load while process bases scaling application create multiple processes to handle the increased load. But scaling using such method comes with its own set of challenges that we should be aware of, i.e:

  1. Sharing resources: sharing resources among threads is not really a problem as they run in shared memory space of its process but it is really challenging to share resources among processes because each process runs in separate (non-shared) memory address. The process generally uses some kind of IPC(Inter-process communication) API which is generally platform dependent. some of the options in IPC are sockets, shared memory, pipe etc.
  2. Concurrency: simply means happening at same time. when two or more processes/threads try to access the shared resources at the same time then this situation is called concurrency. concurrency with the process is possible only when there is more than 1 CPU core in the system, while concurrency for the thread is possible even with 1 core.concurrency brings a lot of challenges as like race condition, inconsistent state, deadlock etc.

Cluster Module

Node.JS is a single threaded so there is no way to use threads to scale, so Node.JS uses process-based scaling. A module that is responsible for providing cluster related API is called a cluster. It’s Node.JS inbuilt module. Node.JS cluster uses single master(parent) process and many worker(child) process. See below diagram to know more…

From above picture it is clear what is master and worker, and their responsibilities. But how do we detect in code if its master or worker so that we can put our corresponding logic. cluster.isMaster is boolean flag that will tell if its master or worker. See below code to know more…

const cluster = require('cluster');

if(cluster.isMaster){
  //put your master process logic here
  //all code contained here will become part of master
  //i.e fork as many worker processes
}else{
  //put your worker processes logic here
  //all code contained here will become part of worker
  //i.e http server(express,koa etc) logic
}

Master Process in Cluster

Now let’s see the responsibility of master process in details. Below is the list of some of the important work that master do or should do.

  1. Fork/Spawn Worker: Master is responsible for creating as many worker processes as you need but you should not create more than the number of CPU core available in that particular system.

    In another way, a maximum number of the worker should be equal to a number of CPU core i.e if its dual-core CPU then you should fork maximum 2 worker processes. Use cluster.fork() API to create new worker process.

    Whenever a new worker is ready, cluster will emit online event so we can use cluster.on('online', ()=>{}) API to know when the worker is ready for work.
  2. Re-fork/Re-spawn Worker: Since worker process can be killed any time for any reason. You should put your logic inside master to re-spawn worker as per your need. Whenever any worker is killed cluster emit an exit event so we should use API cluster.on('exit',()) to re-spawn new worker on the death of an old worker.
  3. Load Balancing: It is automatically done by node.js so we don’t have to write any logic to do this. Master process is responsible for distributing incoming connection/request among its worker processes in efficient manner. Currently, nodejs uses two type of scheduling policy:

    1.RoundRobin which is defined by the constant cluster.SCHED_RR and it is the default method on all OS except Windows, 2.LeaveItToOS which is defined by the constant cluster.SCHED_NONE and it is default method on Windows OS. Use API cluster.schedulingPolicy to get the current value.

Now see below code on how to implement all these logic/responsibilities of the master process.

const cluster = require('cluster');
const os = require('os');

const MAX_CLUSTER_PROCESS = os.cpus().length;

if(cluster.isMaster){
  console.log(`Master process with pid ${process.pid} starting...`);

  for(let i=0; i<MAX_CLUSTER_PROCESS; i++){
    cluster.fork();
  }

  cluster.on('online', (worker) => {
    console.log(`worker with id:${worker.id} & pid:${worker.process.pid} is online`)
  })

  cluster.on('exit', (worker, code, singal) => {
    console.log(`worker with id:${worker.id} & pid:${worker.process.pid} died with code:${code} and signal:${signal}`);
    console.log('forking new worker becasue one of my worker died');
    cluster.fork();
  })

}else{
  //put worker logic here
}

Worker Process in Cluster

As we said worker processes is responsible for sending the response to client request. So worker process is an actual server. You would have used node.js mostly to write HTTP server using a framework like express.js, koa.js etc or even using node inbuilt module HTTP.

A worker process is like these normal Node.JS web server. Whatever code we use to write normal Node.JS server, we should put all those codes inside worker process. It is as simple as that. See below code for simple HTTP server using http module…

const cluster = require('cluster');
const http = require('http');

if(cluster.isMaster){
  //put master logic here
}else{
  console.log(`Worker/Child process with pid ${process.pid} starting...`);
  let httpServer = http.createServer((req, res) => {
    res.writeHead(200);
    res.end(`hello world from worker/child process with id:${cluster.worker.id} & pid:${process.pid}`);
    res.end();
  }).listen(3000);

  httpServer.listen(3001);

  httpServer.on('listening', () => {
    console.log('server is listening on port 3000');
  })
}

Communication within Cluster

Node.JS cluster module provides some IPC API through which master process can talk to worker process and vice versa. You can send a message from one process to other process using a method called process.send() and receive message from one process into other process using event listener API process.on('message', ()=>{})

if (cluster.isMaster) {
  clustor.on('message', (worker, msg) => {
    console.log(`message ${msg} recieved from worker`);
  })
  const worker = cluster.fork();
  worker.send('hello worker');

} else if (cluster.isWorker) {
  process.on('message', (msg) => {
    console.log(`message ${msg} recieved from master`)
    process.send("hello master");
  });

}

There are some API that can be used only in master or worker. See below table to know which API is available in which process. These are only basic but important API.

API available only in Master API available only in Worker
cluster.workers return the list of active worker objects, keyed by worker id i.e cluster.workers[1] cluster.worker return reference of current worker object
cluster.workers[i].send() send message to worker whose id is i. It is simillar to ChildProcess.send() cluster.worker.send() send message to master. It is simillar to process.send()
cluster.worker[i].on('message', (msg)=>{}) listen for message from worker whose id is i cluster.worker.on('message', (msg)=>{}) listen for message from master
cluster.on('message', (worker,msg)=>{}) listen for message from workers process.on('message', (msg)=>{}) listen for message from master

Reference

  1. See full code and instruction for quick demo at your local machine
  2. See cluster related API for reference