I got some e-mail after the previous blog (http://dell.to/144sqai) on 2GB/core recommendations for HPC compute nodes. It turns out that some of you know the memory capacity requirements of your workloads and it is currently 48GB per (2-socket) compute node. Kudos for determining the minimum amount of memory required!
But configuring to the minimum required memory assumes that less memory is “better”: it costs less money and has less potential negatives. More on that later.
Continuing, the logic goes that 48GB/node is 24GB/socket on a 2-socket node. And since there are four (4) memory channels per socket on an Intel SandyBridge-EP processor (E5-2600) and one would like to maximize the memory bandwidth, one needs 4 x 6 GB DIMMs to achieve the required 24GB per socket. But, alas, there is no such thing as a 6 GB DIMM.
Hence, a 4 GB DIMM and a 2 GB DIMM are used on each memory channel. Several of you shared this configuration data with me. This does many things correctly:
- Complies with my previous Rule #1: Always populate all memory channels with the same number of DIMMs. (That is, on all processors use the same DIMMs Per Channel or DPC). Check.
- Complied with my previous Rule #2: Always use identical DIMMs across a memory bank. Check.
- Does not use 3 DPC, which would negatively affect memory performance. Check.
- Meets the known memory capacity requirements. 4 GB plus 2 GB is 6 GB. 6GB per memory channel is 24GB/socket and the required 48GB/node. Check.
Therefore, the memory configuration is balanced and a good one, technically speaking.
However, let’s dig deeper and take into account a few other things. One is my previous Rule #3: Always use 1 DPC (if possible to meet the required memory capacity). The others are to consider today’s price and tomorrow’s requirements.
As stated in the previous blog, I like to create the “best” memory configuration for a given compute node and then see if the memory/core capacity is sufficient. In other words, in high performance computing take memory performance into account (first) in addition to the age-old capacity requirements. And as usual, price comes into play. In this 48GB/node case, the price is indeed a driving factor.
To be consistent with the previous blog, we’ll use the same memory sizes and prices, based upon the Dell R620, a general purpose, workhorse, 1U, rack-mounted, 2-socket, Intel SandyBridge-EP (E5-2600) compute node platform. Below is that same snapshot of the memory options and their prices taken on 12-July-201
Here’s the layout of a 48GB/node configuration using 4 GB DIMMs and 2GB DIMMs. Also, in the figure is the total memory price for that configuration.
Here’s an alternate layout using 8 GB DIMMs. Also, in the figure is the total memory price for this configuration.
Here are the key features of the second configuration:
- More than the 48GB capacity required
- Less $$$ (per node; consider this ~$300 savings times the total number of nodes)
- Less parts to potentially fail (in fact, half the parts to fail)
- Fewer types of spare DIMM parts to stock
- Easier correct replacement of failed DIMMs
- More available memory slots for future expansion
- “Future proof”
“Future proof? What does he mean by that?” Did you notice the memory per core in the figures above? The 48GB/node configuration using 4 GB DIMMs and 2GB DIMMs is 3GB/core for today’s mainstream 8-core processor. The 48GB/node specification may in fact be tied to the GB/core and the core count per processor. Today’s node may need 48GBs, but a node with more cores may need more memory.
We know from several public places (e.g., http://www.sqlskills.com/blogs/glenn/intel-xeon-e5-2600-v2-series-processors-ivy-bridge-ep-in-q3-2013/ ) that the follow-on to the Intel SandyBridge-EP processor (E5-2600), codenamed Ivy Bridge-EP, will officially be called the Intel Xeon E5-2600 v2. The mainstream v2 processor will feature ten (10) cores, compared to today’s 8 cores. With this future processor, the alternate memory configuration above using 8 x 8GB provides a total of 64GB/node. This 64GB/node on a 2-socket node with 20 cores is 3.2GB/core, still exceeding the 3GB/core of the 48GB node today.
If you have comments or can contribute additional information, please feel free to do so. Thanks.
–Mark R. Fernandez, Ph.D.
(original posting: http://dell.to/16EjfPl )