AMD FX-8150 Review: From Bulldozer To Zambezi To FX

Single Floating-Point Unit, AVX Performance, And L2

Two Cores, One FPU

The shared floating-point unit is separate from the integer pipelines. So, when operations hit the dispatch interface at the end of the decode stage and on the way to the integer units, any floating-point operation part of that stream instead goes to the floating-point scheduler. There, they compete for resources and bandwidth independent of the thread to which they belong.

As you can see in the diagram below, AMD’s floating-point logic is different from the integer side. Its purpose is execution-only; it reports its completion and exception information back to the parent integer core, which is responsible for instruction retirement.

The floating-point unit features two MMX pipelines and a pair of 128-bit fused multiply-accumulate (FMAC) units. Those FMAC pipes support  four-operand instructions, which give you a non-destructive result. Intel plans to incorporate the three-operand format in its Haswell micro-architecture (the one to follow Ivy Bridge). AMD says it’ll also support FMA3 in the successor to Bulldozer, called Piledriver, expected in 2012.

Any time we see vendors take divergent plans like this, we have to wonder how it’ll affect developers. So, we asked Adrian Silasi of SiSoftware what he expected to happen, and he pointed out that most developers won’t want to implement three code paths (one for AVX-only, one for AVX plus FMA3, and one for AVX plus FMA4). This makes good sense. And when you consider that few applications exploit AVX today and none of them utilize FMA, AMD should be in a better position to support all three paths when Piledriver becomes a reality.

The more pressing question today is how will Bulldozer’s AVX support size up to Intel’s? Sandy Bridge gives you two 256-bit AVX operations per clock, while Bulldozer facilitates one.

Leading up to this launch, I started talking to Noel Borthwick, a talented musician and CTO of Cakewalk, Inc., about his company’s work optimizing Sonar X1 for AVX. According to a whitepaper that Noel co-authored, AVX instruction support helps reduce the software load applied by performing audio bit depth conversations while streaming audio buffers through the playback graph, rendering, and mixing. Common conversions include 24-bit integer to 32-bit floating-point, 64-bit double-precision conversion, and 32-bit float to 64-bit double-precision.

To that end, Noel sent over the binary for a test application that compares two of Cakewalk’s AVX-optimized routines to the unoptimized version. Both AMD and Intel have access to this very same metric, so its results shouldn’t come as a surprise to either company.

Architecture
Operation
Result (CPU Cycles Gained/Lost)
AMD Bulldozer
Copy Int24toFloat64
61% Gain
AMD Bulldozer
Copy Float32toFloat64
77% Loss
Intel Sandy Bridge
Copy Int24toFloat6469% Gain
Intel Sandy Bridge
Copy Float32toFloat6414% Gain


In the Copy Int24toFloat64 operation, Intel’s Core i7-2600K sees a 69% gain, while AMD’s FX-8150 realizes an also-impressive 61% gain. What does “a gain” actually constitute? We’re talking about the number of CPU cycles that AVX helps reduce, yielding an increase in potential processor bandwidth. Phrased differently, Sandy Bridge cuts the number of cycles by 1.69x, while Bulldozer reduces them by 1.61x.

On the other hand, in the Copy Float32toFloat64 operation, Core i7-2600K realizes a 14% gain as FX-8150 suffers a 77% loss. In trying to explain that loss, it seems that either Cakewalk’s vectorization intrinsics (or, less likely, Microsoft’s) aren’t optimized for AMD’s architecture. In either case, an application patch or Visual Studio service pack could be needed.

If you flip to the Sandra 2011 results, you’ll see that AVX support does help FX-8150’s integer and floating-point performance. Sandy Bridge simply realizes a much larger floating-point gain in this synthetic metric.

Just before we wrapped up testing, AMD forwarded along two versions of x264, the software library behind front-ends like HandBrake (you’ll see us test the latest version of HandBrake shortly). However, these builds incorporate support for AVX and XOP instructions, the latter of which is exclusive to AMD’s architecture.

I modified Tech ARP’s x264 HD Benchmark 4.0 to utilize each of the new code paths, plus CPU-Z 1.58 for system information, and ran FX-8150 through the pair, along with Core i5-2500K through the AVX-optimized build.

The results between AMD’s AVX and XOP code paths are pretty similar. Intel manages to finish the first pass faster, but AMD delivers better performance on the second pass.

Now, bear in mind that the number of AVX-optimized tests is small. It’s going to take a lot of software development work before we get a clearer picture of how AVX instruction support affects each of these architectures.

Sharing The L2

We already mentioned the shared L2 TLB responsible for servicing instruction- (front-end) and data-side (integer core) requests. However, there’s also a unified L2 cache shared between the two cores. This repository is 2 MB per module, giving you 8 MB of total L2 on a four-module FX-8000-series processor.

AMD says the Bulldozer module’s data prefetcher is also the product of significant power and silicon investment, which it gets away with by amortizing across both cores.

Create a new thread in the UK Article comments forum about this subject
This thread is closed for comments
16 comments
Comment from the forums
    Your comment
  • jaksun5
    Fuck this, I'm over your comments sections. They either don't work, are flooded by spam or I lose everything I've written and have to rewrite
  • jaksun5
    OK, here it goes again... :-)

    Unfortuante that there wasn't a more competitive showing by AMD. Up until recently we could still say that performance pre dollar was still with them in alot of cases. Now it seems even that point is going to Intel for some time to come.

    One the bright side it appears that here in Oz that a new segment in the full size (14-15") notebook market in the last few months created by the release of the AMD Radeon on die processor powered notebooks in the $330-$450 space, where previously new notebooks could barely be had under $500, and even then they were powered by awful Celeron processors with even worse graphics. If AMD can move enough of these low end units then maybe they'll have a chance to improve their line up, if the talk of scaling isn't just hot air.
  • bobbyp86
    Looks like I've saved myself a load of money upgrading my x4 955 this year, Bring on the 7000 series GPUs :D
  • technogiant
    AMD is becoming a "promising pete", it's always jam tomorrow but NEVER performance delivered today.
    Be that with their roadmap of promised performance increases or the promise of increased performance on apu's via gpgpu applications.
    I will believe it when I see it if ever.

    I don't think they are even plan that effectively, I mean their proposed utilization of core/module parking in win8, great for power efficiency, but what about performance? For that you would need to spread the threads evenly across the single cores of each module so they don't share resources and only start using the second core in each module when the first core approaches max load.

    The implementation in win8 will only reduce performance and enhance power efficiency.
  • doive1231
    I feel like the hyper-intelligent pan-dimensional beings who have just been given the answer of 42 to their question ie. disappointed and fed-up I have to wait for something better. Perhaps we should leave it to Intel to build a computer capable of finding the question.
  • blubbey
    I would say I'm disappointed but it's not like we didn't know this already - surely there'd have been some 'leaked' benchmarks on the internet to promote it more if it was as good, if not better than SB.
  • codefuapprentice
    I'm actually disappointed in bulldozer, i was hoping it would give intel a massive shake up like the athlon series did for a few years, as it stands i'm not gonna be upgrading from my Phenom II 955 any time soon
  • das_stig
    Not the best review for AMD but look on the bright side, the prices will drop like a stone and aslong as it can play all your games at the highest resolution and all the eye candy on, without needing its own power plant and pipeline to the south pole for cooling, then why worry.

    Can we all afford these super computers sucking 1000 watts from the socket, no, I would rather wait a fraction of a second and save a few quid each month.

    Future chips may just come with a few surprises, once AMD wake up and smell the coffee.
  • Anonymous
    Well common boys don't expect AMD to come out of the blue and own SB. AMD is in a very different situation, they went the GFX route awhile back and hence much of potential RnD money was taken away. Intel simply spends huge amounts of cash on their manufacturing process and micro-architecture development, which is why it's leading atm. IMSO (in my subjective opinion) Bulldozer was a strategic move intended to compete in the long run, so perhaps we will see what comes of it.
  • wild9
    I really don't know what to say about Bulldozer, I've got very mixed feelings. In the meantime thank you Chris Angelini, for the in-depth analysis.
  • jrtolson
    to all those expecting the bulldozer to be a "holographic chip from the future, running at 200 ghz" to have turned itslef on browsed porn for u before u got home from work? then u are fools lol (no offence)

    im pretty sure the current business models for both amd and intel are not "spend 200 squillin dollars" in r&d on making processor chips that can run the main computer of a galaxy class starship, using exotic materials (other than silicon) etc etc

    im running an amd64 3200+ single core (venice) in my rig, and it does everything and more than i want it to do.. i can play all the latest games run the most demanding software.. my point is my pc 7 years old im running windows 7 on it and it does me fine.. the market does not need nor are consumers ready for a leap in processor tech so for a business model why not realease minor improved chips and keep the dollars rolling in? than gamble everything on something that might break your company before it is even ever realeased?
  • theFatHobbit
    I was hoping this would make intel nervous and lower their prices to compete with bulldozers price/performance. but no luck.
  • miklatov
    These results are a real shame. I'm neither AMD or Intel inclined, prefering to stay agnostic, but I do like healthy competition (It works well for us buyers, right? :D) and this offering just doesn't really cut it on performance or price.
  • dillyflump
    Have to say i'm a little disappointed at the raw power per core of these FX chips in games, but i'm pretty sure the intel sandybridge and other core i7's are out in front due to hyperthreading on each core. World of Warcraft is programmed to only use two physical cores, but the intels get around it with hyper threadings 2 extra logical cores to process on. If game engines were better programmed to actually work on a cpu's physical cores and not logical ones i'm pretty sure the FX chips would beat the sandybridge processors. Perhaps the tested could look it up, but last year I was reading an article on how to force the warcraft engine to use multiple cores not just the 2. Looked complex to do but having ordered a bulldozer FX 8-core and a new 990FX board i think i'll try and get this work around to use all the chips power and see what results i get teamed with crossfire 6870's
  • HEXiT
    lolz... seriously m8 try to at least understand... the 2500K doenst support hyper threading so how can it be out in front because of it... AMD promised the world a cpu that could compete with intel's latest and they delivered 1 that can compete with there last gen only. as for you being pretty sure... well im pretty sure you think you know a lot more than you actually do, and its gonna cost you a fair bit of cash...

    not only does the part not perform consistently and never will in a gaming environment. its power inefficient to the tune of 180+watts. seriously guy rethink your choice... you would be no worse off performance wise buying a P'II 970 and waiting for the next iteration that will still underperformed against intels ivy bridge...

    as for your theory on how WOW is processed your off the mark there too.. intel only use hyper threading when a game/application asks for it. on a single core wow will use hyper threading (if available) as i needs 2 Cores to work best, on a dual core it will use 2 cores without hyper threading and on a quad it will use 2 cores without hyper threading. just because a core shows 75 percent usage doesn't mean its using 25 percent hyper threading.
    case in point wow performs no better on an intel 2500k than it does on the intel 2600k 1 has hyper threading the other doesnt.

    seriously m8 i aint trying to be a jerk, but it defiantly looks like you have a case of "thinks he knows"... you seem to be operating on assumptions about intel rather than fact... use places like wiki, toms, hardware secrets and other places to get the rite info b4 you make a misinformed choice.
  • Anonymous
    But between the i5 and i7 which is best on cost vie performance to be honest i have not look at an AMD chip based PC in years, why would you? and based on the excellent review / bench mark i will not be changing my mind for some years to come.