Categories:

A corrected and re-examined memory heirarchy (continued)

A corrected and re-examined memory heirarchy (continued)

Ad

With any architecture comprising four cores, the memory hierarchy becomes particularly important. The different cores must be able to effectively communicate data, but at the same time avoid polluting the cache of the other cores, which are working on separate tasks.

The 3 level architecture is the solution to this problem; the four cores have a private space of 512 Kb available individually and a unified space through which they can communicate effectively.

AMD K10 architectureQuad-Core AMD K10 Barcelona Opteron Phenom

The way this new hierarchy works doesn’t change anything we know about AMD processors since the Thunderbird. The L2 cache is still exclusive, which means that the data of the L1 cache are not duplicated, similarly, the data form the L3 cache is also exclusive.

We often talk about cache victims; when data is about to be deleted from the L1 cache during a conflict, instead of being written in the central memory it is placed in the L2 cache, similarly when data must be deleted from the L2 it is rewritten in the L3 cache.

The L3 cache also includes algorithms designed to make it more effecient for multithreaded applications. When data is reclaimed from the L3 cache, if this data has the chance to be attained by the other cores (if it is about a code or if this data was shared previously) it is placed in the L1 cache. It also remains in the current cache.

If data is only necessary in one core it is placed directly in the L1 cache and taken from the L3. You may also have heard of non-inclusive cache victims; the L3 cache contains “victims” deleted from the L2 cache. This is the opposite of the L2 cache, which only contains data which was once in the L1 cache.

With regards the characteristics of these different cache levels, there is little change for Barcelona; the L1 cache is still associated in a group of 2 blocks and has a latency of 3 cycles. On the other hand, its interface has been enlarged by 256 bytes versus 128 on the K8. The L2 cache is associated by an ensemble of 16 blocks and has a latency of 9 cycles on top of the 3 cycles of the level 1 cache. Finally the L3 cache is associated by an ensemble of 32 blocks. AMD doesn’t give its latency, which makes sense given the fact that it is variable in function of the bandwidth reclaimed by the different cores.


Ad
Talkback
JeanLuc 18/09/2007 03:00
Hide
-0+
JeanLuc

Why is the wording under the pictures in French?

MrRimmer 19/09/2007 11:31
Hide
-0+
MrRimmer

It looks like the editor either hasn't been doing his/her job properly or is not a fluent English speaker. There are at least half a dozen spelling errors in this article, and the grammar is somewhat less than perfect!
Apart from that, an interesting read.

Fragula 19/09/2007 12:00
Hide
-0+
Fragula

Re: "AMD K10: The Architecture of the Revival?"

Article compares apples and oranges. :-(

i.e. It would be fair to compare the memory architecture of Coppermine vs. Thunderbird, as an example of where AMD /romped/ ahead.

Go back to Tomshardwares own archives and compare those memory architectures.

Or as another example, compare Katmai with the original Slot-A Athlon K75.

Where's the definitive great chart of all (x86) CPUs gone? Where are the archives?? What happened to the once-great tomshardware.com????

Cheers!

Fragz.

Comments are closed on this page.
Google Ads
Ad