AMD's EPYC Milan and Nvidia's "Volta-Next" GPUs Combine To Power Shasta Supercomputer

AMD recently announced its EPYC Rome processors, the first 7nm data center chips on the market, but the company is already moving forward with its next-generation products. Here at Supercomputer 2018, the US Department of Energy (DOE) announced that its Perlmutter supercomputer will come armed with AMD's unreleased EPYC Milan processors. The new supercomputer will also use Nvidia's "Volta-Next" GPUs, with the two combining to make one of the fastest supercomputers in the world.

The Perlmutter supercomputer will be built using Cray's Shasta supercomputer platform, which was also on display here at the show. The supercomputer will be built with a mixture of both CPU and GPU nodes, with the CPU node pictured above. This watercooled chassis houses eight AMD Milan CPUs. We see four copper waterblocks that cover the Milan processors, while four more processors are mounted inverted on the PCBs between the DIMM slots. This system is designed for the ultimate in performance density, so all the DIMM sticks are also watercooled.

The DOE presented a slide outlining the Milan processors. But, in a case study of how easily slides can be misinterpreted if you aren't there for the presentation, the speaker specifically stated that the "64 cores" listing refers to AMD's Rome processors, and not the Milan chips. For now, the DOE isn't at liberty to disclose the core counts for the Milan CPUs.

The Rome processors AMD recently announced will come with the 7nm process and the Zen 2 microarchitecture, while the Milan CPUs will come with the Zen 3 microarchitecture built on the 7nm+ process. The slide also lists 8 channels of DDR memory with >=256 GiB of memory per node, but the speaker again specified that the memory capacity figure is based on Milan's specifications either matching or exceeding that of Rome. The supercomputer will also have nodes dedicated to GPU compute. Each node will have four Nvidia "Volta-Next" GPUs installed, but again, the specifications listed in the slide merely indicate the Volta-next GPUs will exceed the current-gen V100's specifications. All of that high-powered compute packed into a slim blade server is guaranteed to generate a copious amount of heat, and here we can see the connections at the rear of the blade for the warm-water cooling system.

The CPU and GPU nodes connect to a unique networking attachment, shown here with the connections that mate between the two chassis. You can see the waterblock in the center of the networking chasis. That covers the dedicated networking ASIC, and water also circulates through the metallic blocks over the networking attachments on the rear. That helps deal with heat generated by optical links. Optical links are supported, but the system is designed with cheaper and more-reliable copper connections in mind. Cray isn't sharing networking speeds and feeds, though we expect it to be 100Gb/s, or faster.

The two nodes, once mated together, connect to the top of rack switch (above) via the networking, which is then tied together into a Dragonfly topology that Cray designed to reduce the number of hops for data that traverses the nodes.

The new supercomputer will also come with an all-flash storage system, largely due to the plummeting costs of flash, eliminating the need for costly and less-spacious burst buffers to handle sporadic and intense storage workloads. The new system will come online in 2020, and we expect the DOE to release more information as the time nears.

    Your comment
  • LordConrad
    But can it... play Solitaire?
  • johnynavvaro
    But can it even turn on?
  • fevanson
    The real question is, can it run Crysis?