Source: Tom's Hardware UK – Keywords: AMD, K10, architecture
Categories: Hardware
Reading and decoding of instructions
The first part of running a program (for a processor at least) consists of reading the instructions from the machine, reconstituting it from the level 1 instructions cache. Since 1995 and the introduction of the PRO Pentium, these instructions can’t be run natively by modern processors, they must then pass through a decoding phase.
During this phase specialised units take the complex x86 instructions (their size varies and the position of the operation code is not fixed) and furnish RISC instructions to the core for execution: they’re far simpler and of the same size, which makes them much easier to handle and allows for execution to take place in disarray.
From his phase we can see that there is something of a philosophical difference between AMD and Intel. With the Core 2 Intel was set on improving the performance with regards decoding.
For this, the engineers have added a new decoder, bringing the total from 3 decoders to 4. They also added a fusion technique, which allows (as its name suggests) the processor to fuse two instructions (test and branching) into one. So in an ideal situation the Core 2 can read and decode up to 5 x86 instructions per cycle versus a maximum of 3 for any processors derived from the P6.
Since Athlon the AMD processors can also read a maximum of 3 instructions per cycle and this limit hasn’t changed with the introduction of Barcelona. Although this first impression is quite deceiving and doesn’t quite reflect on the processor itself.
Firstly the decoders used by AMD are more “powerful” than those used by Intel, they are symmetrical and can decode all the instructions themselves. Contrary to existing Intel processors (with the exception of the Core 2, which possesses a “complete” decoder) the other 3 were simpler and could only carry out the decoding of certain instructions.
Then there are the modifications brought with the Barcelona itself: now the CPU reads 32 bytes from the cache instead of 16 (as on the K8 and the Core 2 Duo). The reason for this bump to the read speed is simple: the SSE instructions for 64 bits are larger (the latter use a byte prefix for each instruction called REX) and also, the more their usage is dispersed, the more the bandwidth became a gordian knot in the decoding phase. On top of this, the debit of the SSE units having increased (we’ll come back to that later), the decoders had to be able to dance to the same tune.
If this improvement was judged on an architecture making use of 3 decoders, what is there to say about the Core 2, which uses an additional decoder and half the bandwidth?
It’s clear that it is a point Intel engineers will have to focus on to improve the architecture. We should also note that the fusion instructions (macro-op fusion) is not active in 64-bit on the processors derived from the Core 2 architecture. The same is true of the potential markets of these processors; where the Barcelona seems to be aimed at servers, the Core 2 shows itself t be more suited for the general public market and, more specially, towards the laptop market.
Since the beginning there has been a divergence between the Barcelona and the Core 2. Where the latter’s focal point is the peak performance. By contrast, the new AMD architecture tries to guarantee a performance under any conditions.
- Previous page The overall picture (cont.): really new?
- Next page Continued: Branch prediction
- Feeling the Squeeze: AMD's Athlon 64 X2 6400 Black Edition
- Can CPUs (Finally) Make PCs Faster as Well as Quieter?
- Overclocking to new limits: Testing the new Core 2 Stepping
- Do More Cores Beat More Clock Speed?
- Extreme FSB: Taking the E6750 Beyond 4 GHz
- Tom's Hardware's 2007 CPU Charts
- AMD's Smart Strike: Athlon X2 BE-2350
- Energy Efficiency Duel: Intel Left Out In The Cold
- Which is the Best Mainstream CPU?
- The Gigahertz Battle: How Do Today's CPUs Stack Up?

Why is the wording under the pictures in French?
It looks like the editor either hasn't been doing his/her job properly or is not a fluent English speaker. There are at least half a dozen spelling errors in this article, and the grammar is somewhat less than perfect!
Apart from that, an interesting read.
Re: "AMD K10: The Architecture of the Revival?"
Article compares apples and oranges. :-(
i.e. It would be fair to compare the memory architecture of Coppermine vs. Thunderbird, as an example of where AMD /romped/ ahead.
Go back to Tomshardwares own archives and compare those memory architectures.
Or as another example, compare Katmai with the original Slot-A Athlon K75.
Where's the definitive great chart of all (x86) CPUs gone? Where are the archives?? What happened to the once-great tomshardware.com????
Cheers!
Fragz.