Is Cache Size Really The Key To Boosting Performance?

Large Caches: Performance Or A Business Decision?Large Caches: Performance Or A Business Decision?

Caches for processors have the sole purpose of reducing memory access by buffering frequently used data. While main memory capacities are somewhere between 512 MB and 4 GB today, cache sizes are in the area of 256 kB to 8 MB, depending on the processor models. Yet, even a small 256-kB or 512-kB cache is enough to deliver substantial performance gains that most of us take for granted today.

There are various ways of implementing cache hierarchies. Most PC systems have processors with a small first-level cache (L1, up to 128 kB), which is often divided into a data cache and an instruction cache. The larger L2 cache usually stores memory data, and is shared by both processor cores for Intel Core 2 Duo CPUs, while an Athlon 64 X2 or a Pentium D has dedicated L2 caches per core. L2 caches can work exclusively or inclusively, which means that they either store a copy of the L1 contents - or they don’t. AMD will soon offer a third cache level, which will be used as a shared cache memory for the AMD Phenom processors with up to four cores. The same is anticipated for Intel’s 2008 Nehalem processor architecture, which will replace Core 2.

L1 cache has always been on the processor, while first L2 caches were implemented onto motherboards, as it was the case with many 486DX computers and Pentium machines. Simple SRAM chips (static RAM) were used as first-cache memory; pipelined burst cache took over soon (Pentium) until on-chip and on-die caches became possible. The Pentium Pro at 150 to 200 MHz was the first processor to host 256-kB L2 cache memory inside the CPU, making it the largest ceramic package ever on desktops or workstations. The Pentium III for socket 370, running at 500 MHz to 1.13 GHz clock speeds, was the first processor model to carry 256 kB L2 cache on a die, which has the advantage that there are no latencies and the cache operates at CPU speed.

Integrated L2 cache resulted in considerably improved performance across virtually all applications. The performance impact even is significant enough to say that L2 cache is the most important performance factor on an x86 microprocessor. Disabling the L2 cache will reduce system performance more than disabling a second CPU core of a dual-core processor.

However, cache memory isn’t only a performance factor. It has become a powerful tool to create different processor models for the low-end, mainstream and the high-end segments, as it enables a processor manufacturer to play with defect rates as well as with clock speeds. Defect-free silicon allows for the utilization of the entire L2 cache memory, and it runs at wonderfully high clock speeds. If it should not reach the target clock speed, the die may still become an entry-level model for a high-end processor line, e.g. a Core 2 Duo 6000 with 4 MB cache and a low clock speed. Should parts of the L2 cache be defective, the manufacturer has the option to shut them down and create a lower-end model with less cache memory, e.g. a Core 2 Duo E4000 model with 2-MB cache, or even a Pentium Dual Core with only 1-MB cache. All of this makes sense, but the question still is: how much of a difference does the cache memory really make?

Create a new thread in the UK Article comments forum about this subject
This thread is closed for comments
Comment from the forums
    Your comment
  • nick001
    If possible, I'd like to see comparisons between the AMD and Intel CPU's with the cache disabled. Would the performance loss on the AMD processors be less as they have 512Kb/1Mb in comparison the Intel's 1/2/4Mb and due to the integrated memory controller?
    Or is the Core2Duo's architecture so much more superior that it will still beat the "more elegant solution"?
  • Geffen
    Is the question not whether cache is benfical but whether it is good use of the silicon real estate. When the Athlon upgraded from the Thunderbird core to the Palomino core with no increae is cache size it got about 5% faster for a minor increase in the transitor count (37 million to 37.2). When they upgraded from Thoroughbred to Barton which double the cache from 256kb to 512kb resulted in a huge transitor count increase from 37.6 to 54 million for a 5% speed increase. Based on this it seems to me that adding cache is a lazy but expensive way to increase performance by the chip manfactures and it would be better if they spent more time looking at other ways to improve their chips.
  • Allubz
    Just a bit of a shame you didn't add to the conclusion that the PRICE difference between the processors compared to the PERFORMANCE difference between them.
    Short: Price-performance

    Because PP-wise:
    E2160 $72.00
    E4400 $129.99
    X6800 $985.00
    Prices from Newegg (in most countries the differences are even bigger)

    So the av. difference at the same clock speed between the E2160 and the X6800 is about 10% and the price difference is nearly a horrible 1400%!!

    Like most reviews IF you add anything like this, the conclusion will probably be:

    If you've got a budget then consider taking the cheapest E2100 serie. If you want to build your-average PC take a E4000 serie and well, if you've got a wallet you found to empty then hit it with a grand to get rid of it before the cops find out.

    Anyway, my point is that I think Toms should inform people about reasonable price performance differences. If more review sites do this then manufacturers will ofcourse keep higher prices, but will see a drop in buys of these products and see their mid-range products being bought and used very well. (or they'll start producing low -and mid-range products that are very limited so they can't compete at any rate with the high-end parts).

    Just my two cents...
  • jamesalexw
    Actually the first processor to have on-die full speed 256kb L2 cache was the AMD K6-III, not the Intel Pentium III Coppermine.

    The K6-III was released in Febuary 1999, Coppermine Pentium III's didn't appear until late October.