Rambus proposes Terabyte per second memory initiative

Los Altos (CA) - Rambus will announce on Wednesday a new Terabyte Bandwidth Initiative (TBI) designed to bring terabyte bandwidth to future many-core architectures. The design places 16 DRAM channels, each operating at 16 Gbps with 4 bytes of data per clock. In theory, the total aggregate memory throughput would be 1,024 Gigabytes (1 Terabyte) available via sixteen separate channels, each of which could be piped directly to a group of multiple cores.

Forward looking information TG Daily has obtained prior to their official November 28 release indicates Rambus has not yet built a fully functioning prototype as of this announcement. They do have the current technology operating at full speed, however, on modern process technology. They’ve developed a prototype test vehicle using a generic 65nm process technology which operates on a single channel at 64 GB/s. Complete information about the test vehicle, its design, as well as actual speeds and full initiative data, will be released by Rambus on November 28 at the Rambus Developer Forum in Tokyo, Japan.

Rambus TBI terabyte Rambus is using on-chip PLL-based FlexLink clocking system to achieve the 16 Gbps pulse frequencies per channel. An input clock of 500 MHz is piped through a PLL to achieve the 32x frequency, at 16 Gbps or 64 GB/s. They are also moving to a Fully Differential Memory Architecture (FDMA) signaling system for Command / Address, data and clock signals. DDR3 only used differential on the clock, for example. And XDR version one only uses differential on the clock and data signals, and that at only 8x the input clock. Combined with Rambus FlexPhase technology, designed to accurately align high speed clocks with their data payloads, the overall number of errors should be reduced at these high speeds.

Rambus has also redesigned their Command / Address bus for TBI as well. The legacy systems utilizes a 12-line connection from memory controller to DRAM. Their new TBI design will utilize a 2-line connection which will also pump at 32x the input clock, which can also be scaled as necessary. According to Rambus, this will allow a much lower cost SoC solution while simultaneously providing greater support for different granularities (how much memory is retrieved from each request, as low as 16 bytes, typically 64 or 128 bytes). With additional bandwidth available via the 32x link, granularity could be reduced in size without affecting throughput or latency.

Multiple 32x links connect the memory controller with the physical DRAMs. The final system will have 16 complete links. Two are shown here (left and right).

Rambus would like to have TBI memory technology in high volume products as early as 2010 or 2011, though concerns over significant power consumption and heat generation remain a primary focus. They told us the 45nm or smaller process node would likely be required for early adopter of TB/s commercial products. This timeframe would likely coincide with the production capacity of many-core products by Intel and AMD. Rambus committed to a low power signaling technology initiative in February, 2007. They aren’t working with any partners to develop TBI at this time as this remains an initiative, not a product or even complete prototype.

Rambus is working internally on additional technologies which could also push toward these high-speeds and performance. TBI is currently their primary low-cost/high-volume approach. In 2006, the PS3 used approximately 50 GB/s. The historical trend for memory increases would place 500 GB/s as standard fare in the 2010/2011 timeframe. However, recent advancements in semiconductor technologies could place the typical 10x speed increase seen every 5 years or so more in line with the 20x speed increase Rambus is seeing with their TBI, and without this specific technology being employed.