Source: Tom's Hardware UK – Keywords: Phenom_9700, Spider_Platform, 790FX
Categories: Hardware
Technology I - Advanced Memory Prefetcher, SSE4a
As our avid readers will undoubtedly remember, Intel introduced the first SIMD extensions to the X86 ISA in the shape of the MMX instruction set 10 years ago. As a countermove, AMD implemented the 3DNow feature in its own processors. This resulted in a situation where software did not benefit from the same kind of performance boost on both companies’ processors, since it had to be specially optimized to take advantage of the extensions. Thankfully, this kind of competition and incompatibility died down, and the SSE, SSE2 and SSE3 extensions used by AMD and Intel were identical. However, the two chipmakers are now parting ways once again, to the detriment of the users and the programmers. With the launch of the Penryn core, Intel introduced the SSE4.1 instruction set. AMD, meanwhile, is implementing SSE4a (formerly known as SSE128) in the new Stars Core micro architecture.

The Phenom’s SSE unit is being widened to 128 bits, up from the Athlon 64’s 64 bit unit. Additionally, AMD is adding four new instructions, namely EXTRQ/INSERTQ and MOVNTSD/MOVNTSS. Two more instructions, LZCNT/POPCNT, which are primarily used for load operations and bit manipulations functions, are included as well.
Sadly, Intel’s SSE4.1 and AMD’s SSE4a are incompatible with one another – a fact that may soon cause problems for programmers and users alike.

The advanced memory prefetcher can load data directly from the RAM to the core’s L1 cache without needing to take a detour through the L2 cache first. Thus, the data can be loaded into the processor with a much lower latency. Simultaneously, this also results in a lower load on the L2 cache, which can instead buffer data more efficiently, in turn translating into an overall performance boost.
Furthermore, the prefetcher identifies recurring data patterns and can pre-fetch them even before they are requested.
x86 instructions are between 3 and 15 bytes long. Compared to the Athlon 64 core, the data buffer for fetching instructions was increased to 32 bytes, allowing the core to process more instructions simultaneously. Thus, as you can see in our diagram, up to three instructions can be processed at the same time, depending on the length of the instructions.

- Previous page The Phenom in Detail - a Revamped...
- Next page Technology II - Branch Prediction,...
- AMD Radeon HD 3800: The Empire Strikes Back
- BIOS Flash - Overclock Your Graphics Card in 5 Minutes
- Six Graphics Cards with Luxury Trimmings
- Workstation-Shootout: ATi FireGL V7600 vs. Nvidia Quadro FX 4600
- AMD HD 3800 To Support DX 10.1
- Nvidia's GeForce 8800 GT Reviewed
- DirectX 10 Shootout: Geforce 8x00 vs. Radeon 2x00
- DirectX 10 Cards on a Budget
- The Best Gaming Graphics Cards For Your Money: October 2007
- Can Integrated Graphics Cut It For Gaming Or HTPC?
I'd rather have a slower processor but not have to rebuy the 3 of the most expensive components (CPU, mobo, and RAM) every time I want to upgrade something. That's why I've stuck with AMD for the last few years. Can't wait to drop a couple of Phenom FXs in to my 4x4 platform and have 8 processing cores.
What is going to happen when AMD shortly moves to 45nm processors with DDR3 memory controllers?
unless AMD are going to put both DDR2 and DDR3 memory controllers on their 45nm processors or make them in both DDR2 and DDR3 versions then you will have to change you're ram, motherboard and processor to go 45nm.(that didn't happen with intel)
Unless motherboard makers put both DDR2 and DDR3 slots on current boards although the latter wouldn't be supported until 45nm come in.(can't see that happening though).
I think these chips have more to come, thats an engineering sample and the mainboards got a chipset with undeveloped drivers.
I say give it a month for the nvidia chipset....... and retest.
I can see why they need 4x Crossfire boards given the lack luster performance of the new ATI cards... ATI seem to be struggling since their Cope-de-grace with the X19xx series...
AMD clearly has a good design (architecture) but the process technology is their achilles heal. They need 4Mb+ L3 cache, high K transistor process, and 45nm like yesterday!!
As a bit of an AMD fanboy (I'm on a dual Opteron rig just now) I hate to see whats happening to them now!!
Bob
P.S. But it will be better
Overall, I like the way AMD has gone for compatibility and performance. The price is phenomenal and the ease of implementation will ensure downtime during upgrades is kept to a minimum. It's easy to under-estimate just how hard that is to pull off.