Download the Tom's Hardware App from the App Store
The reference for current tech news
Yes No

Athlon II Or Phenom II: Does Your CPU Need L3 Cache?

by

It makes sense to equip multi-core processors with a dedicated memory utilized jointly by all available cores. In this role, fast third-level cache (L3) can accelerate access to frequently needed data. Cores should not revert to accessing the slower main memory (RAM) whenever possible.

That’s the theory, at least. AMD’s recent launch of the Athlon II X4, which is fundamentally a Phenom II X4 without the L3, implies that the tertiary cache may not always be necessary. We decided to do an apples to apples comparison using both options and find out.

How Cache Works

Before diving deeper into our tests, it’s important to understand some basics. The principle of caches is rather simple. They buffer data as close as possible to the processing core(s) in order to avoid the CPU having to access the data from more distant, slower memory sources. Today’s desktop platform cache hierarchies consist of three cache levels before reaching system memory access. The second and especially the third levels aren’t just for data buffering. Their purpose is also to prevent choking the CPU bus with unnecessary data exchange traffic between cores.

Cache Hit/Miss

The effectiveness of a cache architecture is measured by its hit rate. Data requests that can be answered within a given cache are referred to as hits. If that cache doesn’t contain the sought data and must pass the request on to subsequent memory structures, this is a miss. Obviously, misses are slow. They lead to stalls in the execution pipeline and introduce wait periods. Hits, on the other hand, help sustain maximum performance.

Cache Writes, Exclusivity, Coherency

Replacement policies dictate how room is created in a full cache for new cache entries. Since data written into a cache eventually has to be available in the main memory, systems can either do this at the same time (write-through) or mark overwritten locations as “dirty” (write-back) and execute the write once the data is wiped out of the cache.

Data on several levels of cache can be stored exclusively, meaning that no redundancy exists. You won’t find the same piece of data in two different cache structures. Alternatively, caches can operate in an inclusive manner, with lower levels guaranteed to hold the data found in higher-levels (closer to the processor) of cache. AMD’s Phenom works with an exclusive L3 cache, while Intel follows the inclusive cache strategy. Coherency protocols take care of maintaining data across multiple levels, cores, and even processors.

Cache Capacity

Larger caches can buffer more data, but they also tend to introduce higher latency. Since cache also consumes large amounts of a processor’s transistors, it is important to find a viable balance between transistor cost and die size, power consumption, and performance/latency issues.

Associativity

RAM entries can either be direct-mapped, meaning that there can only be one position in a cache for copies of main memory, or they may be n-way associative, which stands for n possible positions in the cache to store data. Higher associativity (up to fully associative caches) provide the best caching flexibility because existing cache data doesn’t have to be overwritten. In other words, high n-way associativity guarantees higher hit rates, but it introduces more latency, since it takes more time to compare all of those associations for hits. Ultimately, it makes sense to implement many-way associativity for the last cache level because there’s the most capacity available, and searching beyond that would send the processor out to slower system memory.

Here are some examples: The Core i5 and i7 work with 32KB of 8-way associative L1 data cache and 32KB of 4-way associative L1 instruction cache. Clearly, Intel wants instructions to be available quicker while also maximizing hits on the L1 data cache. Its L2 cache is also 8-way set-associative, while Intel’s L3 cache is even smarter, implementing 16-way associativity to maximize cache hits.

However, AMD follows another strategy on the Phenom II X4 with a 2-way set-associative L1 cache, which offers lower latencies. To compensate for possible misses, it features twice the memory capacity: 64KB data and 64KB instruction cache. The L2 cache is 8-way set-associative, like Intel's design, but AMD’s L3 cache works at 48-way set associativity. None of this can be judged without looking at the entire CPU architecture. Naturally, only the benchmarks results really count, but the whole purpose of this technical excursion is to provide a look into the complexity behind multi-level caching.

Share:
15
Comments
Read more
X
Submit

Comments
Read the comments on the forums
Herr_Koos 06/10/2009 10:37
Hide
-2+

OK, good start. Now maybe you can overclock the 620 to 3.0GHz+ and compare it to the 945 or 955, so we can see the other end of the scale.

meodowla 06/10/2009 13:30
Hide
-2+

Need to check with Athlon64 X2 6000+ and Phenom II X2 550 B.E

Herr_Koos 06/10/2009 13:35
Hide
-1+

@meodowla: Not a fair comparison. The Athlon X2 is an older architecture, thus the L2 vs L3 argument is meaningless in this case.

aje21 06/10/2009 15:20
Hide
-1+

What about the AMD K6-III, that had L3 cache (on the motherboard) in the x86 desktop space back in 1999!

wild9 07/10/2009 08:00
Hide
-0+

What a close race that was..perhaps some server benchmarks could widen the gap, or heavy multi-tasking?

Anonymous 07/10/2009 11:11
Hide
-1+

would love to see power consumption benchmarks too.
how would having l3 cache impact benchmark is an important decision esp in hot countries like singapore!

MasterDOOM 08/10/2009 09:54
Hide
-1+

Nice Review

wild9 08/10/2009 14:19
Hide
-0+

wild9 :
What a close race that was..perhaps some server benchmarks could widen the gap, or heavy multi-tasking?



Aah, the K6 days..happy memories :) I initially saw that kind of scenario when this 'lite' quad appeared, but the performance difference is marginal compared to say, the K6-2 and K6-III/III+. In fact I think it's going to have a detrimental effect on Phenom II x2 sales as well as Phenom II x4 as what incentive is there if you can get a slower CPU and just clock it up a bit, even if it hasn't got an unlocked multiplier?

I'm not complaining about the performance gap, though..it's nice to have these options - and at these prices, don't you think? It's also really, really cool to be able to take some older hardware and still retain drop-in compatibility with the latest core revision. That's what I like about AMD parts even if it might take a bit longer to get them (thanks to AMD being stung by the original Phenom issues).

But those K6-III days..I remember racing home with a K6-III+ (mobile) hip and overclocking a Socket 7 board to 550MHz, then watch it blow away a Pentium MMX on hard drive throughput thanks to that tri-level cache designed (L1+L2 = CPU; L3 = 1MB or 2MB motherboard previously referred to as L2 for K6-2 users). To think how we valued 50MHz increments back then, same for the 97MHz vs 100 and 112MHz FSB variations, and the lack of PCI locks causing hard drive interfaces to overclock hard drives to their death lol. I read rumours of some folks being able to clock those + chips upto and beyond 700MHz! But like today with the Athlon II x4, the core was refined rather than re-designed so the sythetic performance improvement wasn't huge. Happy days are here again.

Solitaire 08/10/2009 14:28
Hide
-1+

Page 4... "Since the slowest Phenom II X4 starts at 3.0 GHz"... No... That's just plain wrong. The slowest P2 is the X4-810, running at 2.6GHz. And I'd imagine that you'd want to test L3 cache at the highest speeds possible anyway, as that's when you'd see the biggest improvement over a (OCd) 3.4GHz A2 - the much higher difference in cache and system RAM speeds makes the larger cache more pivotal...

DAVID GREGORY KERR 08/10/2009 17:34
Hide
-0+

This CPU should be faster than all Intel processors bar none, I suspect that when a personal computer with two CPU sockets should make for a system that would put Intel's SKULLTRAIL system to shame, imagine a system using 8 AMD ATHLON IIx4 would be a very nice system to have think your very own CRAY.

Anonymous 13/10/2009 05:54
Hide
--1+

I wonder what the dif would have been had the reviewer used a CFD run as a bench? It seems to me that a nice transonic airflow through two pipes into a mixing chamber in Ansys CFX or Fluent would REALLY bring out the dif between L3 cache or not...

leexgx 14/10/2009 15:34
Hide
-0+

you should of done an clock for clock and then the Phonem II at its default clock speed as well, as your Not likely going to be down clocking an phonem II, i know is was an clock for clock review but you could of put default clock speeds in as well

Anonymous 14/11/2009 23:14
Hide
-0+

It is astonishing that the level 3-cache make so little difference for AMD-cpu's. While Intel CPU's get an enormous boost from any increase in cache size, the effect on AMD CPU's is nearly absent. Thus I start to wonder if this is due to technology (hardware) or a kind of 'bug' in the compilers that cripples the benefit of a large level 3-cache in AMD's range of CPU's. Do AMD have any idea why their CPU's respond so poorly to increased size of the level 3-cache?

arakrazy 05/12/2009 05:11
Hide
-0+

Rab1d-BDGR :
Wouldn't a clock-for-clock comparison - i.e. underclocking the phenom to the same clock speed be more useful for answering the L3 cache question?...



http://www.tomshardware.co.uk/athl [...] 697-3.html

This *is* a clock-for-clock comparison...

xiaodada 20/03/2012 08:52
Hide
-0+

What about the AMD K6-III, that had L3 cache (on the motherboard) in the x86 desktop space back in 1999!

Best offers

Newsletters


OK