Sandy Bridge Running Slow?

This week I was asked if I knew any reason that a Sandy Bridge system would run slower than an approximately equivalent Westmere system.

[I would not normally blog about such a thing, but this is the third time in the last two weeks that this type of question has surfaced!]

The Intel Sandy Bridge processor contains new instructions collectively called Advanced Vector Extensions, or AVX. AVX provides up to double the peak FLOPS performance when compared to previous processor generations such as the Westmere. To take advantage of these AVX instructions, the application *must* be re-compiled with a minimum compiler version that supports AVX. With Intel that is the Intel Compiler Suite starting with version 11.1 and starting with version 4.6 of GCC.

If an application has not been re-compiled with an AVX-aware compiler, the application with not be able to take advantage of these Sandy Bridge instructions. And it will probably run slower than previously seen on older processors, even including Westmere processors with higher frequencies.

Let me say this another way: A Westmere executable will run fine on a Sandy Bridge system due to Intel’s commitment and extensive work to maintain backwards compatibility, but it will probably run slower with no errors or any indications of why.

Furthermore, re-compiling “on” the Sandy Bridge processor, but using an older compiler (pre icc 11.1 or pre ggc 4.6) does not help. Remember, use the latest compiler on those shiny new platforms!

For just one example of Westmere vs. Sandy Bridge performance improvements that are possible, please see our blog at:

HPC performance on the 12th Generation (12G) PowerEdge Servers:

I know there are some codes for legal, certification or other reasons that cannot be “changed.” But I certainly hope that this policy has not bled over into not even being able to re-compile apps to take advantage of new technologies.

For additional information on AVX and re-compiling applications, see:

Intel Advanced Vector Extensions:

How To Compile For Intel AVX:

Optimizing for AVX Using MKL BLAS:

If you have comments or can contribute additional information, please feel free to do so. Thanks.

–Mark R. Fernandez, Ph.D.


(original posting: )


Calling All Mathematicians: Re-Inventing Deterministic Solutions

(AKA: Can you improve the solution to a problem that has already been solved?)


Dell’s Dr. Mark Fernandez

Today, we are using high performance computing (HPC) to solve the hardest problems faced by humanity. In many cases, if any single component fails, the potential solution is lost and must be restarted from the beginning. This leads to an emphasis on HPC system “availability”. If we are to advance our solutions as fast as we are advancing the size of HPC systems, this must be addressed.

I attended the 25th anniversary of the High Performance Computing and Communications Conference in Newport, R.I. last week (March 29-31, 2011). During that conference, a question was posed on this very subject, improving large-scale HPC system availability. As HPC systems, and the problems they solve, evolve from hundreds to thousands into millions, and eventually into even higher numbers of individual processors working jointly on a single problem, system availability becomes critical. In many cases, “hope” of completing a run is becoming as important a factor as the answer itself.

The idea that availability consists of two key components, reliability and resiliency, was introduced by Dr. Eng Lim Goh from SGI and further discussed. Reliability at the hardware component level is continually improving, but it will never be 100%. Components will eventually fail. And invariably interrupt a very important calculation.

In the presence of component failures, numerous efforts are underway to improve resiliency. Many involve redundant hardware or other excessive or heroic efforts. These themes have been around for awhile, and are currently proving to be too costly. At the software level, for example, the concept of CheckPoint-Restart (CPR) has been tried repeatedly and is, in general, too expensive in terms of time, hardware, bandwidth, costs, and/or energy.

It occurred to me that we in computer science, at our core and most fundamental and basic level, rely upon the fruits of mathematical labor. And that this fertile field is where we should look for the characteristics of resiliency. In general, many of the solutions from mathematics are deterministic. A logical, thorough, and provable solution is the goal of all mathematicians. And once solved, they move on. Been there; done that; got the nerdy t-shirt to prove it.

Suppose a solution (based upon that mathematics) is used by computer scientists, engineers and researchers, on a large scale HPC system, to solve a problem of critical importance to humanity. And unfortunately, that system cannot run long enough without interruption to solve that problem. In some sense, the value and elegance of the mathematical solution is lost.

I am wondering if mathematicians would be willing to re-examine that absolutely logical, thorough and proven solution. At each step, would you consider each term and assume it were lost? For each potential lost term, is there an equally elegant, logical, thorough and provable method to regain that term, and/or continue to a correct solution efficiently, without starting over?

Is there a “resilient” FFT method? Can a finite difference technique continue if a term is lost? Or a complete Navier-Stokes solution?

Computer Science and the legacy of HPC today rely heavily upon the past successes of mathematicians and mathematics departments. For the next generation of large-scale computer systems performing large-scale problem solving, I believe the solutions may also come from those same mathematicians and mathematics departments.

Remember that the future of the human race depends on HPC and we depend upon you. No pressure…

–MRF, 02/15/2012.

(original posting: )