Categories:

Technology I - Advanced Memory Prefetcher, SSE4a

07:38 - Monday 19 November 2007 by Bert Töpelt, Daniel Schuhmann
Source: Tom's Hardware UK – Keywords: Phenom_9700, Spider_Platform, 790FX
Categories: Hardware

Table of content:

Technology I - Advanced Memory Prefetcher, SSE4a

Ad

As our avid readers will undoubtedly remember, Intel introduced the first SIMD extensions to the X86 ISA in the shape of the MMX instruction set 10 years ago. As a countermove, AMD implemented the 3DNow feature in its own processors. This resulted in a situation where software did not benefit from the same kind of performance boost on both companies’ processors, since it had to be specially optimized to take advantage of the extensions. Thankfully, this kind of competition and incompatibility died down, and the SSE, SSE2 and SSE3 extensions used by AMD and Intel were identical. However, the two chipmakers are now parting ways once again, to the detriment of the users and the programmers. With the launch of the Penryn core, Intel introduced the SSE4.1 instruction set. AMD, meanwhile, is implementing SSE4a (formerly known as SSE128) in the new Stars Core micro architecture.

The Phenom’s SSE unit is being widened to 128 bits, up from the Athlon 64’s 64 bit unit. Additionally, AMD is adding four new instructions, namely EXTRQ/INSERTQ and MOVNTSD/MOVNTSS. Two more instructions, LZCNT/POPCNT, which are primarily used for load operations and bit manipulations functions, are included as well.

Sadly, Intel’s SSE4.1 and AMD’s SSE4a are incompatible with one another – a fact that may soon cause problems for programmers and users alike.

The advanced memory prefetcher can load data directly from the RAM to the core’s L1 cache without needing to take a detour through the L2 cache first. Thus, the data can be loaded into the processor with a much lower latency. Simultaneously, this also results in a lower load on the L2 cache, which can instead buffer data more efficiently, in turn translating into an overall performance boost.

Furthermore, the prefetcher identifies recurring data patterns and can pre-fetch them even before they are requested.

x86 instructions are between 3 and 15 bytes long. Compared to the Athlon 64 core, the data buffer for fetching instructions was increased to 32 bytes, allowing the core to process more instructions simultaneously. Thus, as you can see in our diagram, up to three instructions can be processed at the same time, depending on the length of the instructions.


Talkback

tstebbens 19/11/2007 12:40
Hide
-0+
tstebbens
"Intel, since the chip giant has already announced that its current high-end platform X38 will be incompatible to the next generation of high-end CPUs at the beginning of next year."

I'd rather have a slower processor but not have to rebuy the 3 of the most expensive components (CPU, mobo, and RAM) every time I want to upgrade something. That's why I've stuck with AMD for the last few years. Can't wait to drop a couple of Phenom FXs in to my 4x4 platform and have 8 processing cores.
technogiant 19/11/2007 01:32
Hide
-0+
technogiant
Not so sure about forthcoming compatibility.

What is going to happen when AMD shortly moves to 45nm processors with DDR3 memory controllers?

unless AMD are going to put both DDR2 and DDR3 memory controllers on their 45nm processors or make them in both DDR2 and DDR3 versions then you will have to change you're ram, motherboard and processor to go 45nm.(that didn't happen with intel)

Unless motherboard makers put both DDR2 and DDR3 slots on current boards although the latter wouldn't be supported until 45nm come in.(can't see that happening though).
spuddyt 19/11/2007 06:39
Hide
-0+
spuddyt
why do the companies have to be so selfish with this incompatable SSE thingy....
spuddyt 19/11/2007 07:00
Hide
-0+
spuddyt
P21, that table isn't really taking the quad core part into account... because look, in things like supreme commander its quite a bit faster... (which is important to me...)
spuddyt 19/11/2007 07:03
Hide
-0+
spuddyt
LOL at the .1% extra value for money compared to the q6600, AMD clearly did that to annoy intel...
jamesalexw 19/11/2007 09:47
Hide
-0+
jamesalexw
Well seen as though thats a brand new chipset as well wouldn't it be a little foolish to expect full performance straight out of the box ?

I think these chips have more to come, thats an engineering sample and the mainboards got a chipset with undeveloped drivers.

I say give it a month for the nvidia chipset....... and retest.
BobWya 19/11/2007 11:23
Hide
-0+
BobWya
Intels @4Ghz OC on air... AMD is on @3Ghz OC on air... Oh dear...

I can see why they need 4x Crossfire boards given the lack luster performance of the new ATI cards... ATI seem to be struggling since their Cope-de-grace with the X19xx series...

AMD clearly has a good design (architecture) but the process technology is their achilles heal. They need 4Mb+ L3 cache, high K transistor process, and 45nm like yesterday!!

As a bit of an AMD fanboy (I'm on a dual Opteron rig just now) I hate to see whats happening to them now!!

Bob
crackez 20/11/2007 03:03
Hide
-0+
crackez
Bob if AMD was having 4mb+ L3 cache, high K transistor process and 45nm it will be twice expensive as the intel's solutions.

P.S. But it will be better :P.. I'm currently on a AMD Athlon 64 X2 5000+.. and it's a bit overclocked(from 2.6 GHz to 3.01 Ghz) ..I'm on water although. Wish AMD be back on top!
Wild9 22/11/2007 03:18
Hide
-0+
Wild9
I think for server setups this HT 3.0 is gonna shine. More so if you want to build a super-computer system. The bandwidth gains are enormous. But the actual core performance..sorry AMD, but this is just a little disappointing. Intel's current stock is already ahead of you. Perhaps 45nm products will address this (and allow for more L3 cache).

Overall, I like the way AMD has gone for compatibility and performance. The price is phenomenal and the ease of implementation will ensure downtime during upgrades is kept to a minimum. It's easy to under-estimate just how hard that is to pull off.
Wild9 22/11/2007 03:28
Hide
-0+
Wild9
One of the CPU-Z pics shows a higher Vcore? Come on lads..who'se been naughty! ;) p.s. I would.

Note You are going to post a comment as anonymous.

  •  

Google Ads