Large Caches: Performance Or A Business Decision?Large Caches: Performance Or A Business Decision?
Caches for processors have the sole purpose of reducing memory access by buffering frequently used data. While main memory capacities are somewhere between 512 MB and 4 GB today, cache sizes are in the area of 256 kB to 8 MB, depending on the processor models. Yet, even a small 256-kB or 512-kB cache is enough to deliver substantial performance gains that most of us take for granted today.
There are various ways of implementing cache hierarchies. Most PC systems have processors with a small first-level cache (L1, up to 128 kB), which is often divided into a data cache and an instruction cache. The larger L2 cache usually stores memory data, and is shared by both processor cores for Intel Core 2 Duo CPUs, while an Athlon 64 X2 or a Pentium D has dedicated L2 caches per core. L2 caches can work exclusively or inclusively, which means that they either store a copy of the L1 contents - or they don’t. AMD will soon offer a third cache level, which will be used as a shared cache memory for the AMD Phenom processors with up to four cores. The same is anticipated for Intel’s 2008 Nehalem processor architecture, which will replace Core 2.
L1 cache has always been on the processor, while first L2 caches were implemented onto motherboards, as it was the case with many 486DX computers and Pentium machines. Simple SRAM chips (static RAM) were used as first-cache memory; pipelined burst cache took over soon (Pentium) until on-chip and on-die caches became possible. The Pentium Pro at 150 to 200 MHz was the first processor to host 256-kB L2 cache memory inside the CPU, making it the largest ceramic package ever on desktops or workstations. The Pentium III for socket 370, running at 500 MHz to 1.13 GHz clock speeds, was the first processor model to carry 256 kB L2 cache on a die, which has the advantage that there are no latencies and the cache operates at CPU speed.
Integrated L2 cache resulted in considerably improved performance across virtually all applications. The performance impact even is significant enough to say that L2 cache is the most important performance factor on an x86 microprocessor. Disabling the L2 cache will reduce system performance more than disabling a second CPU core of a dual-core processor.
However, cache memory isn’t only a performance factor. It has become a powerful tool to create different processor models for the low-end, mainstream and the high-end segments, as it enables a processor manufacturer to play with defect rates as well as with clock speeds. Defect-free silicon allows for the utilization of the entire L2 cache memory, and it runs at wonderfully high clock speeds. If it should not reach the target clock speed, the die may still become an entry-level model for a high-end processor line, e.g. a Core 2 Duo 6000 with 4 MB cache and a low clock speed. Should parts of the L2 cache be defective, the manufacturer has the option to shut them down and create a lower-end model with less cache memory, e.g. a Core 2 Duo E4000 model with 2-MB cache, or even a Pentium Dual Core with only 1-MB cache. All of this makes sense, but the question still is: how much of a difference does the cache memory really make?