CPU Core Roadmap, Sunny Cove Microarchitecture, 10nm Ice Lake
Intel took the wraps off its new CPU core roadmap and Sunny Cove microarchitecture at the event. The company's dominance in the chip market has long been predicated on process and microarchitecture leadership, but Intel's approach to designing new processor cores has been inextricably tied to its onward march to smaller process nodes. That means that its new CPU core designs (microarchitectures) have traditionally required a move to new, smaller manufacturing processes.
That approach became a liability as Intel encountered massive delays with its 10nm process. Instead of bringing out new core designs, the company was mired on the 14nm process for four years as it constantly refined the process through a cadence of "+" iterations. Each new iteration of the 14nm process found the company offering higher frequencies, and thus more performance, as it marched from 4.2 GHz up to 5.1 GHz. These improvements delivered up to 70% more performance since 14nm's debut in 2014, but the lack of a new microarchitecture, which typically improves the processors' instruction per clock (IPC) throughput, slowed its progress.
After learning a hard lesson exacerbated by the resurgent AMD nipping at its heels, Intel tells us that the company will now design new microarchitectures to be portable between nodes. That will allow the company to move forward even if it encounters roadblocks on its path to smaller transistors.
The Sunny Cove microarchitecture is the first new design that can be used on multiple nodes, and even though Intel has stated the new core will debut on the 10nm node, it hasn't verified that it will come with the Ice Lake chips. In line with its new design ethos, Intel also tells us that it will select different nodes for different products based on the needs of the segment. That's similar to the approach taken by third-party fabs like TSMC and Global Foundries, and it means Intel could choose to use Sunny Cove with 14nm processors as well.
Intel's CPU Core Roadmap
Intel has typically used the same naming convention for its microarchitectures as it does for its processors. Hence, Skylake processors came with the Skylake architecture, and Kaby Lake processors came packing the Kaby Lake architecture. That old paradigm changes now that Intel has decoupled its architectures from the end products, so the company debuted a new roadmap specifically for CPU cores.
Intel presented its new roadmap for both its Core and Atom lineups. As usual, the Core series addresses the company's bread-and-butter high-performance chips, while the Atom chips serve the low power segment.
Intel's Sunny Cove will debut in 2019, bringing with it higher performance in single-threaded applications, a new instruction set architecture (ISA), and a design geared for scalability. Willow Cove will follow with an improved cache hierarchy, security features, and transistor optimizations. The Golden Cove microarchitecture will debut in 2021 with a focus on yet more single-threaded performance, AI performance, networking improvements, and new security features. Atom will receive a slower cadence of improvements, with Tremont debuting in 2019 and Gracemont in 2021. 'Next' Mont will arrive before 2023.
Intel plans for general performance improvements through three key design tenets of going deeper, wider, and smarter, but it is also improving what it calls 'special purpose' use cases, like AI, cryptography, and compression/decompression workloads.
A Refresh On Skylake (Refresh)
Intel presented us a quick refresh on its Skylake architecture that underpins its Skylake, Kaby Lake, Coffee Lake, and Cascade Lake processors. The design (left) processes operations through two reservation stations (RS). It can process seven operations simultaneously and propel them to the integer (INT), vector (VEC), store data, and address generation units (AGU).
The new Sunny Cove design features improvements in every level of the pipeline. Key improvements to the front end include larger reorder, load, and store buffers, along with larger reservation stations. This allows the processor to look deeper into the set of incoming instructions to find operations that are independent of each other but can run simultaneously. The operations are then executed in parallel to improve IPC.
Intel increased the L1 data cache from 32KB, the same capacity it has used in its chips for a decade, to 42KB (50% increase). The L2 cache is also larger, but the capacity is dependent upon each specific type of product, such as chips designed for either the desktop or server market. Intel also expanded the micro-op cache (uop) and the second-level translation lookaside buffer (TLB).
A key facet of improving performance is to increase parallelism. That starts with the deeper buffer and reservation stations we covered above, but it also requires more execution units to process the operations.
Intel moved from a four-wide allocation to five-wide to allow the in-order portion of the pipeline (front end) to feed the out-of-order (back end) portion faster. Intel also increased the number of execution units to handle ten operations per cycle (up from eight with Skylake). The Store Data unit can now process two store data operations for every cycle (up from one). The address generation units (AGU) can now also handle two loads and two stores every cycle. These improvements are necessary to match the increased bandwidth from the larger L1 data cache, which now does two reads and two writes every cycle. Intel also tweaked the design of the sub-blocks in the execution units to enable data shuffles within the registers.
Intel reiterated that just increasing the size of the buffers and the number of execution units requires smart algorithmic management to strike a balance between performance and the power budget. That revolves around improving branch prediction accuracy and reducing latency under load conditions. The net effect is a 'significant' increase in IPC. Intel didn't provide specific measurements but promises to share more information as products come to market.
Intel also designed Sunny Cove to address specific use cases, like cryptography, AI, and compression/decompression workloads. The company accomplished these goals by creating new instructions and features to improve performance.
Exploding Memory Capacity
Intel also improved the amount of memory the processor can address, which is a key consideration given its goal of boosting memory capacity with Optane DC Persistent Memory DIMMs. The speedy Optane memory modules provide up to 512GB of memory-addressable storage per DIMM, meaning memory capacity is set to explode as more data centers transition to the technology.
Intel's Sunny Cove moves to a five-level paging structure (up from a four-level structure). That increases the virtual address space up to 57 bits and the physical address space up to 52 bits, meaning it supports up to 4 petabytes of memory. That's up from 64 TB of addressing capability with Skylake.
Intel's New Course with OneAPI
Intel's Sunny Cove is an innovative design that looks promising, but as with all designs, we won't know the true benefits until we see the silicon in our labs. Intel's new vision to decouple its CPU core designs from its process improvements is the real advance that will help the company remain competitive in the future. Intel surely can't afford another period of stagnation like we've seen during its struggles with the 10nm process.
Third-party fabs have proven to be Intel's greatest competitor. TSMC has taken the process lead with its pending 7nm node, and as a result, new 7nm chips will soon flow from the stalwarts of the semiconductor space, like AMD, Apple, Qualcomm, and Nvidia. These companies work together with TSMC to bring their new designs to market, meaning that Intel isn't just competing with one company -- it faces the combined might of several of the behemoths in the chip market.
Intel does have a plan to outmaneuver its rivals by leveraging its wide-ranging product stack, but it surprisingly revolves around software. Intel is working on its new "One API" software, which is designed to simplify programming across its GPU, CPU, FPGA, and AI accelerators. The software goes by the tagline of "no transistor left behind," and given its goals, that's an accurate statement. The new software provides unified libraries that will allow for applications to move seamlessly between Intel's different types of compute. If successful, this could be a key differentiator that other firms will not be able to match with as many forms of compute.
10nm Ice Lake
The company does have its own plans for its resurgence on the process front, though, as evidenced by the brief display of its 10nm Ice Lake data center chip. The company didn't share any details about the new processor.