GeForce GTX 480 And 470: From Fermi And GF100 To Actual Cards!

Additional Reading: SMs, Scheduler, And Texturing

Whereas AMD was able to limit itself to a few minor revisions with Cypress, Nvidia needed to radically change the G80 architecture it introduced over three years ago. The organization of the various units has been completely reworked. GT200 employed what Nvidia called TPCs (Texture Processor Clusters), consisting of one texture unit and three Streaming Multiprocessors (SMs). Nvidia maintains the term Streaming Multiprocessor with the GF100, but these units are now much more powerful.

Streaming Multiprocessors

With GT200, a Streaming Multiprocessor was made up of one 8-way SIMD units, two special function units, and one double-precision unit. With GF100, they now have two 16-way SIMD units and four special function units. So, there is no longer an ALU specifically dedicated to double-precision. FP64 calculation is now carried out by the same units at half the rate.

The increase in processing power is not the most notable change; the texture units are now directly implemented in the SM, whereas beforehand they were decoupled (three SMs shared eight texture units). On GF100, each SM has four texture units of its own, which also explains why their overall number has decreased compared to the preceding architecture (eight units per TPC on GT200 [or a total of 80], compared to four units per SM on GF100 [or a total of 60 for the GTX 480]). Another new feature is the 16 load/store units, enabling addresses to be calculated in cache memory or in RAM for 16 threads per cycle.

Concretely, this reorganization allows Nvidia to offer a much more elegant and more effective architecture than the previous one. The SMs are really independent processors, whereas beforehand they relied on a memory subsystem shared by groups of three SMs.

The size of the register file has also increased. Instead of 16,384 32-bit registers per multiprocessor, there are now 32,768. At the same time, the number of active threads per multiprocessor has increased compared to the GT200, from 1,024 threads (24 warps of 32 threads) to 1,536 (48 warps of 32 threads). Thus, the number of registers available per thread is 21 compared to 16 beforehand (and 10 on G80). Now let’s test the increase in processing power using a few shaders.

On a now-very-simple shader test, lighting per pixel in DirectX 9, the GeForce GTX 480 is 55% more efficient than the GeForce GTX 280. This increases to 82% with a heavier shader using procedural textures.

Dual-Scheduler

Since the appearance of its G80 architecture, Nvidia has been saying that its multiprocessors are capable of executing two instructions per cycle in certain circumstances: one MUL and one MAD. We weren’t able to confirm that with the early versions of the design, and even on the GT200 it was especially difficult to demonstrate. In practice, the famous dual-issue Nvidia was talking up wasn’t observable; the chip could execute only one MAD per cycle.

That’s not so important anymore because Nvidia’s GF100 has two schedulers that enable execution of two instructions per cycle without the constraints that limited previous architectures. As we saw earlier, the 32 stream processors in each multiprocessor are, in fact, arranged in two groups of 16 units, and each of these groups can execute one independent instruction. The recent history of CPUs has proven that superscalar execution is very complex to implement. But Nvidia has a major advantage in this area: GF100 doesn’t attempt to extract parallelism from a single instruction flow, with all the possibilities for error that implies. In fact, the multiprocessor selects two warps and launches execution of one instruction from each of them on the SIMD units. Since the warps are totally independent, they can be executed in parallel without any risk.

Most instructions, whether FP32 calculation instructions, integers, or load/store, can be executed simultaneously. The only exception to that rule is double-precision calculation instructions, which use all 32 stream processors and can’t be executed simultaneously with any other type of instruction.

Texture Units

Though the number of texture units has decreased, Nvidia has completely redesigned them in its current-generation architecture to improve performance. As we said, they are now built into the multiprocessors, which avoids having to share them among several multiprocessors, with all the loss of efficiency that involves. The L1 cache dedicated to texture units has also been redesigned, and though its size is unchanged from GT200 (12KB), Nvidia says that it’s much more efficient. Finally, the texture units are now clocked at the GPU frequency, whereas previously they operated at a lower frequency.

The result: on this test, which measures texture access performance (useful for displacement mapping, for example), the GeForce GTX 480 tested 75% more efficient than the GTX 280.

Obviously, the new units support the new BC6H and BC7H compression formats and the Gather instructions required by Direct3D 11. And performance with more contemporary pixel shaders has indeed increased, though that’s really a matter of catching up with the competition.

Create a new thread in the UK Article comments forum about this subject
This thread is closed for comments
27 comments
Comment from the forums
    Your comment
  • infra
    Great review guys! As for GTX 470/480 - It's not as bad as I expected.The cards show some pretty decent numbers compared to 5870 even without its tessellation power used to its best.Perhaps next-gen Fermi will be a true champion - power and heat will be optimized and games will use the architecture of the GPU to its full potential.All in all it's a great architecture, maybe a bit ahead of it's time if you ask me.
  • Anonymous
    Power hungry, noisy, the fight is on. Glad I got the 5870. The driver-updates will see us through.
  • N19h7M4r3
    Power consuption is really high, but i think that efficiency if actually pretty good, but in the end what will matter is $$$ and not everyone will pay to have the best card on the block.
  • Dandalf
    Quote:
    Do we expect AMD to drop its prices in response? Don’t count on it.


    Dammit I was waiting for these cards SOLELY so ATI drop their prices! Aaaarrgghhh
  • Anonymous
    5000 series will keep their prices for a long time
  • mapleo
    Fermi could be a tragedy in NV's history.
    It seems I have to use HD5870 untill HD6870 or GTX580 release.
  • memeroot
    looks god if it came out 6 months back.... as a 3d vision fan thoug it looks like another wait for the right card
  • Dandalf
    Thanks for translation Rabid, wish i saw it before I started rating him down as a bot :| oops
  • Anonymous
    GTX480 buy it!!! Send stove!!! sorry my english is poor!!!
  • TIMELESS52
    Wow!!!!!! It's the fastest single GPU card on the planet. And it's a toaster oven and space heater too. What will Nvidia think of next?

    I wonder if it will qualify for any exemptions under forthcoming "cap and trade" regulations?
  • FanterA
    it should also be noted that for UK customers (like myself) that a 5870 can be had for less than the asking price of a 470, and for the prices on the 480, you could have a pair of 5850s in crossfire. Add to this the heat and power concerns, and i think I'll forgo Thermi and get another 5850 when I deem it necessary. so glad i didn't wait :D
  • mapleo
    qinmo费米高功耗低性能,画质差性价比低,完全是一个悲剧卡!单就画质一项就让我有足够的理由抛弃N卡了,同样的游戏和电影A卡的画质总是比N卡的好,更细腻更艳丽也更不刺眼.更何况A卡还更省电性价比也更高!阳痿达NV只知道拼命打广告狂吹牛皮,再把巨额的广告成本算进产品成本里让消费者买单,结果N卡总是毫无性价比,阳痿达NV把精力都花在做广告吹牛蒙人上,产品不好是必然的,买N卡的人都是冤大头。还是AMD-ATI实在,把精力都集中在产品研发上,所以A卡不但比N卡好,还更实惠。我就喜欢AMD-ATI这样低调厚道的企业!



    I'm not a fan for any brand. I only choose products base on my needs. That's my point.
  • Anonymous
    haha Fermi you are out!!
  • carlos0248
    我觉得,这个产品就就像编辑说的那样,性能很强劲,功耗很大,价格比较贵,不用希望这样的产品能够导致ATI的显卡降价。

    I thought the GTX480 just like editor said that the best performance but the price and power consumption was higher. Don't count it can cause ait drop their price.
  • Anonymous
    It's a true fact that NV is always good at Games becouse of its "way" plan. Viedo card is often used to play video games after all.
  • goozaymunanos
    sod this..i'm gonna buy a 5850..

    the GTX470 should be retailing at £250.

    cheers,
    gooz


    p.s. stuff & nonsense: http://eupeople.net/forum
  • marney_5
    How much are the Fermi cards in the US again? On overclockers UK the 480 prices around £450! Where the 5870 is around £320! Is this correct? Because Fermi is sh*t value if its only slightly faster and £100 extra!

    I only waited for this card so the ATI prices would go down!!! Dammit!
  • my_jacks
    Sparkle GeForce GTX 480 1536MB GDDR5 PCI-Express Graphics Card
    £445.99 (inc VAT)

    Sparkle GeForce GTX 470 1280MB GDDR5 PCI-Express Graphics Card
    £309.99 (inc VAT)


    Powercolor ATI Radeon HD 5970 2048MB GDDR5 PCI-Express Graphics Card
    £499.99 (inc VAT)

    Sapphire ATI Radeon HD 5870 1024MB GDDR5 PCI-Express Graphics Card
    £299.99 (inc VAT)

    Sapphire ATI Radeon HD 5850 1024MB GDDR5 PCI-Express Graphics Card
    £220.99 (inc VAT)

    - Overclockers UK (29/3/10)
  • 13thmonkey
    what happens to power and heat if v-sync is on, i.e. if the card can do 120+ fps on a game but is limited to 60fps by v-sync, does that reduce the power and thermals as it is only calculating 50% of the frames.

    I assume it calculates a frame, waits for 60hz refresh (idles) displays it, calcs another one waits (idles), calcs another one, etc.

    or does it just calc and calc and calc and then show the one frame that was most recently completed on the refresh, then calc calc calc and show the most recent on the refresh, ignoring the results of the nondisplayed calcs.
  • damian86
    ATI is still being your 'daddy'