Phi …, Nodes, Sockets, Cores and FLOPS, Oh, My!

Intel’s Xeon Phi coprocessor is the subject of this latest blog, which has evolved into a bit of a series of blogs. In the most recent post, I described how to compute the peak theoretical floating point performance of a system hosting GPUs. That one followed a general purpose description of computing peak floating point performance of systems containing CPUs such as Xeon processors from Intel.



Intel’s Xeon Phi coprocessor, unlike GPUs, is based upon the same architecture as the Intel Xeon CPU, such as Westmere, Nehalem, Sandy Bridge and the upcoming Ivy Bridge. As such, computing the peak theoretical floating point performance is similar to computing it for CPUs as described in the first blog.

There are several public references available that indicate that the currently shipping Intel Xeon Phi model 5110 contains 60 cores and that they operate at 1.053-GHz. Unlike GPUs where all cores are not available for double precision floating point math, all 60 cores of Intel’s Xeon Phi are available and computing the peak double precision floating point performance is as straight forward as it was for regular Intel CPUs.

All 60 of these cores can perform double precision floating point math at a rate of 16 flops/clock. Yes, sixteen(16) flops/clock! The AVX in Xeon Phi is one generation ahead of the AVX in general purpose Intel processors. (Comparing and contrasting these 60 cores at 16 flops/clock to a GPU’s 1000-ish cores at 2 flops/clock may be the subject of a future blog.)

Here’s the peak theoretical floating point math for an Intel Xeon Phi 5110:

GFLOPS = 60 cores/Phi * 1.053 GHz/core * 16 GFLOPs/GHz

GFLOPS = 1,010.8

I have seen this appear as “over a teraFLOP” and as 1,011 GFLOPS.

Additionally, the TACC Stampede system uses a special edition of the Intel Xeon Phi called the SE10. It features 61 cores operating at 1.1-GHz.

GFLOPS = 61 cores/Phi * 1.1 GHz/core * 16 GFLOPs/GHz

GFLOPS = 1,073.6

For additional information about TACC and the Stampede systems see:

Hope that helps. As future Intel Xeon Phi models are released in the coming months, these same type computations should be valid to compute their peak performance. For a system, compute the CPU performance of the host node as described in the previous blog. Compute the Intel Xeon Phi performance as described here. The total system performance is the sum of these.

Remember that this is the peak theoretical floating point performance. This is the theoretical performance you are guaranteed to never see. Expect to see more about real-world performance using Intel Xeon Phi as the Centres of Competence come up to speed:

If you have comments or can contribute additional information, please feel free to do so. Thanks.

–Mark R. Fernandez, Ph.D.


(original posting: )



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s