Intel has released details of its Goldmont Plus microarchitecture. The design has several major advancements that boost performance and could lay the groundwork for more core-heavy designs in the future. For now, the revamped low-power cores will help Intel fend off the rise of Qualcomm's ARM-powered Windows 10 devices.
Intel recently released its Gemini Lake architecture with the Pentium and Celeron Silver SOCs. Intel announced the new processors with little aplomb or detail, and follow up questions helped us ferret out a few of the more obvious enhancements to the design. However, the details of the underlying Goldmont Plus microarchitecture, which has a deceivingly similar name to the Goldmont cores, eluded us.
Intel claims to have wrung impressive performance improvements from Gemini Lake, which features the Goldmont Plus cores, but with the same 14nm process found on its predecessor. That hints that the company made major architectural improvements, and now that Intel has refreshed its Architectures Optimization Reference Manual, it's clear the company has taken a huge step forward with the Goldmont Plus microarchitecture.
Goldmont Plus CPU Core Pipeline
The enhanced Goldmont core comes with plenty of improved features. Intel expanded Goldmont's back-end pipeline from a 3-way allocation/retire to a 4-wide allocation/retire alignment, but the design inherits its predecessors' 3-wide fetch and decode pipeline. The microarchitecture also features an enhanced branch prediction unit. Intel also increased the shared second level pre-decode cache from 16KB to 64KB.
Other improvements include a wider integer execution unit. This includes a load/store scheduler, three ALU schedulers, and a new dedicated JEU (Jump Execution Unit) port that supports faster branch prediction. Intel also expanded the load/store buffers, although the document does not provide specifics. A larger reservation station (scheduler) and expanded re-order buffer entries also support a larger out-of-order window. Intel also employs a Radix-1024 floating point divider for "fast scalar/packed single, double and extended precision floating point divides," along with paging cache enhancements.
Intel also added a shared second-level instruction and data TLB (Translation Lookaside Buffer; seen as ITLB and TLB on the graphics). The previous-gen architecture only supported data cache on the second-level TLB. Intel also made paging cache enhancements. There is also a slight uptick in the branch mispredict penalty to 13 cycles.
Perhaps one of the largest changes is a shift to a modular design that features quad-core clusters that share 4MB of L2 cache. That design could allow the company to simply add more clusters to build out larger processors with heftier core counts.
It might seem counter-intuitive to build large Atom-class processors, but pairing the enhanced performance with a low-power modular chip design could prove useful as Intel seeks to fend off newly-resurgent ARM competitors in the data center. The current design could help Intel grapple with the pending influx of Qualcomm-powered Windows 10 ARM devices, so the improved microarchitecture could serve Intel well on multiple fronts.
Intel is notably late with its 10nm process, and many predict the company will not release a new microarchitecture until Ice Lake. so the newly-revamped Goldmont Plus is a positive development. The improvements to the Goldmont Plus microarchitecture are impressive, and curiously understated by Intel; we typically see more fanfare when the company makes big advances. There have already been new devices spotted with Gemini Lake in the wild, so we expect to see several new models wielding the Gemini Lake SoC at CES.