2GB/core is the HPC Gold Standard … But I Know I Need 48GB/node.

I got some e-mail after the previous blog (http://dell.to/144sqai) on 2GB/core recommendations for HPC compute nodes. It turns out that some of you know the memory capacity requirements of your workloads and it is currently 48GB per (2-socket) compute node. Kudos for determining the minimum amount of memory required!

But configuring to the minimum required memory assumes that less memory is “better”: it costs less money and has less potential negatives. More on that later.

Continuing, the logic goes that 48GB/node is 24GB/socket on a 2-socket node. And since there are four (4) memory channels per socket on an Intel SandyBridge-EP processor (E5-2600) and one would like to maximize the memory bandwidth, one needs 4 x 6 GB DIMMs to achieve the required 24GB per socket. But, alas, there is no such thing as a 6 GB DIMM.

Hence, a 4 GB DIMM and a 2 GB DIMM are used on each memory channel. Several of you shared this configuration data with me. This does many things correctly:

  1. Complies with my previous Rule #1: Always populate all memory channels with the same number of DIMMs. (That is, on all processors use the same DIMMs Per Channel or DPC). Check.
  2. Complied with my previous Rule #2: Always use identical DIMMs across a memory bank. Check.
  3. Does not use 3 DPC, which would negatively affect memory performance. Check.
  4. Meets the known memory capacity requirements. 4 GB plus 2 GB is 6 GB. 6GB per memory channel is 24GB/socket and the required 48GB/node. Check.

Therefore, the memory configuration is balanced and a good one, technically speaking.

However, let’s dig deeper and take into account a few other things. One is my previous Rule #3: Always use 1 DPC (if possible to meet the required memory capacity). The others are to consider today’s price and tomorrow’s requirements.

As stated in the previous blog, I like to create the “best” memory configuration for a given compute node and then see if the memory/core capacity is sufficient. In other words, in high performance computing take memory performance into account (first) in addition to the age-old capacity requirements. And as usual, price comes into play. In this 48GB/node case, the price is indeed a driving factor.

To be consistent with the previous blog, we’ll use the same memory sizes and prices, based upon the Dell R620, a general purpose, workhorse, 1U, rack-mounted, 2-socket, Intel SandyBridge-EP (E5-2600) compute node platform. Below is that same snapshot of the memory options and their prices taken on 12-July-201

Here’s the layout of a 48GB/node configuration using 4 GB DIMMs and 2GB DIMMs. Also, in the figure is the total memory price for that configuration.

Here’s an alternate layout using 8 GB DIMMs. Also, in the figure is the total memory price for this configuration.

Here are the key features of the second configuration:

  • More than the 48GB capacity required
  • Less $$$ (per node; consider this ~$300 savings times the total number of nodes)
  • Less parts to potentially fail (in fact, half the parts to fail)
  • Fewer types of spare DIMM parts to stock
  • Easier correct replacement of failed DIMMs
  • More available memory slots for future expansion
  • “Future proof”

“Future proof? What does he mean by that?” Did you notice the memory per core in the figures above? The 48GB/node configuration using 4 GB DIMMs and 2GB DIMMs is 3GB/core for today’s mainstream 8-core processor. The 48GB/node specification may in fact be tied to the GB/core and the core count per processor. Today’s node may need 48GBs, but a node with more cores may need more memory.

We know from several public places (e.g., http://www.sqlskills.com/blogs/glenn/intel-xeon-e5-2600-v2-series-processors-ivy-bridge-ep-in-q3-2013/ ) that the follow-on to the Intel SandyBridge-EP processor (E5-2600), codenamed Ivy Bridge-EP, will officially be called the Intel Xeon E5-2600 v2. The mainstream v2 processor will feature ten (10) cores, compared to today’s 8 cores. With this future processor, the alternate memory configuration above using 8 x 8GB provides a total of 64GB/node. This 64GB/node on a 2-socket node with 20 cores is 3.2GB/core, still exceeding the 3GB/core of the 48GB node today.

If you have comments or can contribute additional information, please feel free to do so. Thanks.

–Mark R. Fernandez, Ph.D.


(original posting: http://dell.to/16EjfPl )


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s