Sizing Hadoop Worker Nodes

Sizing Hadoop Worker Nodes

This is a brief note on doing some rough estimates for sizing Hadoop worker nodes.

This post is in no way exhaustive and there is much more to sizing a Hadoop cluster, but in my line of work I often have to do such calculations so I wanted to put my notes in order and have a quick and easy access to them. I am also planning to do a follow up post for sizing the Hadoop management nodes.

Disk sizing
There is a difference between advertised and actual disk capacity. Manufacturers use base 10 numbers when stating the size of the HDD so in their book a Megabyte is actually 1'000'000 Bytes instead of 1'048'576 Bytes (in base 2).

To compute the actual capacity provided by a disk we can use the following formula:

\text{Actual capacity} = (\text{Advertised capacity in GB} \times 10^9)/1024^3

To do the calculation using TB the formula can be modified like this:

\text{Actual capacity} = (\text{Advertised capacity in TB} \times 10^{12}) / 1024^4

For example, if our worker nodes come with 12x4 TB disks for storing HDFS data, the actual size provided by each node will be:
12 \times (4 * 10 ^ {12}) / 1024^4 = 43.66 \text{TB}

We also have to reserve about 25% of the total space for temporary operations (e.g. sort), which leaves us with 43.66 TB * 0.75 = 32.75 TB of raw disk capacity per worker node.

To compute the usable capacity provided by the nodes we have to divide the raw capacity by 3 as HDFS performs 3-way mirroring for each data block by default. In our case the 32.75 TB of raw disk capacity translates to 32.75 / 3 = 10.92 TB of space for storing data in HDFS.

When deciding the number of disks per worker node, it is also a good idea to include a couple of disks dedicated to the operating system and worker node Hadoop binaries. A 2x1.2 TB in RAID 1 is typically a good choice.

The data disks should be configured as JBOD. Using RAID configuration for the data disks is to be avoided as it limits the total I/O performance of the node to its slowest disk due to disk aging and variance in rotation speed across devices. Striping is also problematic in case of an HDD failure as recovery of a single disk in a JBOD configuration is much easier than recovering an entire RAID-0 array.

CPU sizing
When selecting the number of CPUs for the worker nodes a rule of the thumb is to size for at least 1 CPU core for each HDD. In the example above where we use 12 HDDs for HDFS data we can put 2x E5-2630 V3 CPUs (8 cores each), which gives us 16 cores in total – 12 cores dedicated to the data disks and 4 cores for additional tasks (e.g. OS activities).

Memory sizing
The recommendation for the memory of the worker nodes is to reserve 4 to 8 GB of memory per processing core.
If we average the above values to 6 GB and do the computation for 16 cores, we will have to provide 16 x 6 = 96 GB of RAM.

We also have to factor in some memory for the operating system, for the DataNode deamons, and TaskTracker operations. The recommendation here is to go again for 4 to 8 GB, so if we go on the conservative side we can add another 3 x 8 = 24 GB.
This gives us a total of 96 + 24 = 120 GB, which quite likely will be 128 GB (8 x 16 GB) in the actual physical servers.

There is not much to consider when it comes to networking. Most reference architectures recommend a couple of 10GbE "top of rack" switches (for redundancy) and a dedicated 1GbE switch for management. The two 10GbE switches run the data network and also provide uplink connectivity to the core switches (in a multi-rack environment).