Phinally, the Phull Phi Phamily is Announced!

Intel has announced the full line-up of the now officially named “Intel® Xeon® Phi™ coprocessor x100 family”. Whew! What a mouthful. I call it Phi for short. And we in HPC have been waiting a long time for Larrabee, uh, MIC, er, Knights Corner, I mean Phi to be announced and available to help advance our research.

I am very excited to be able to phinally talk more openly about this accelerator for HPC. In a previous blog, I briefly described the already available 5110 model of the Phi coprocessor and how to compute its peak theoretical performance.

Phi…, Nodes, Sockets, Cores and FLOPS, Oh, My!

STAMPEDE, Texas Advanced Computing Center (TACC)

I also shared that the Dell TACC Stampede system used an early-access, special edition Phi called the SE10. Stampede, which was ranked #7 on the November 2012 Top500 list, now moves up to #6 with the release of the June 2013 Top500 list ( Congrats to Tommy Minyard and the folks at TACC for the improved number.

The production version of the Phi SE10 used in Stampede is called the 7120 and features a bit more performance than the special edition SE10 version. The 7120 was announced at the 2013 International Supercomputing Conference (ISC’13, along with other details about the rest of the Phi models.

For those that don’t have the time to read another blog or don’t want to spend the effort to do the math, here’s the summary of the peak performance of the three Phi models announced:

  • 3120: 1.00 TFLOPS (57 cores/Phi * 1.1 GHz/core * 16 GFLOPs/GHz = 1,003.2 GFLOPS)
  • 5110: 1.01 TFLOPS (60 cores/Phi * 1.053 GHz/core * 16 GFLOPs/GHz = 1,010.88 GFLOPS)
  • SE10: 1.07 TFLOPS (61 cores/Phi * 1.1 GHz/core * 16 GFLOPs/GHz = 1,073.60 GFLOPS) (Note: not available)
  • 7120: 1.20 TFLOPS (61 cores/Phi * 1.238 GHz/core * 16 GFLOPs/GHz = 1,208.28 GFLOPS)

So, what does all this mean and how does it help HPC and Research Computing? In short, we now have another 3 arrows in our quiver to attack the wide range of important problems that we face.

How has the presence of Phi already affected HPC and Research Computing? Well, the #1 system on the June 2013 Top500 list is using 48,000 Xeon Phi coprocessors. Yes, 48 thousand. See the list for more details. Of note is the fact that both TACC’s Stampede with 6,400 Phi coprocessors and the #1 system with 48,000 Phi coprocessors are operating at about 60% efficiency. That’s a consistent number over a wide range of coprocessors.

If you have not yet had a chance to experiment with Phi, then, as usual, I recommend a platform that is more suitable to test-and-development than a production platform such as those deployed at TACC for example. As such, Dell also announced at ISC’13 support for Phi in the PowerEdge R720 and T620, both of which are excellent development platforms for both GPUs and Phi coprocessors. For more information about installing and configuring a Phi, see this posting:

Deploying and Configuring the Intel Xeon Phi Coprocessor in a HPC Solution

When deploying larger quantities of Phi or GPU cards, the production platform used by TACC’s Stampede, the C8220x, is an option.

To get you going on the software side with Phi, be sure to read and bookmark these:

Additionally, on the software side, if you are already using Intel’s Cluster Studio XE (, support for Phi is included.

What does the future hold? Personally, comparing and contrasting the performance of Phi coprocessors and GPUs is still on my list for a future blog. Now that Phi is announced, I may be able to get to that sooner!

Secondly, there is an upcoming whitepaper from Saeed Iqbal, Shawn Gao, and Kevin Tubbs from Dell’s HPC Engineering Team. They present a performance analysis of the 7120 Phi in the R720. Preliminary results indicate about a 6X speedup and 2X the energy efficiency compared to Xeon CPUs on LINPACK. I’ll possibly update this blog with that link and definitely tweet about it as soon as it is available.

Finally, Intel also revealed that the next-gen of Phi is code-named Knights Landing and will be available not only as a PCIe card version as today but also as a “host processor” directly installed in the motherboard socket. They also shared that the memory bandwidth will be improved. This might help with the efficiency mentioned previously.

CPUs, GPUs, Coprocessors and soon, “host processors”. Interesting times ahead. I’ll be following those developments and sharing critical information as it becomes available.

If you have comments or can contribute additional information, please feel free to do so. Thanks.

–Mark R. Fernandez, Ph.D.


(original posting: )


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s