The Architecture in Detail
Source: Tom's Hardware – Keywords: nvidia, gtx, 280
The Architecture in Detail
A SIMT Architecture?
You’re familiar with the terms SIMD and MIMD, but with the GT200 Nvidia describes its Shader Multiprocessors as "SIMT units." So what are they? The acronym stands for Single Instruction Multiple Threads, and the main difference between it and SIMD mode is that the size of the vectors being processed has no predefined width. Concretely, with a sufficient number of threads, the processor behaves like a scalar processor. To grasp the difference, remember how pixel shader units operated in previous architectures.
The rasterizer generated quads – squares of 2x2 pixels, where each pixel is made up of a vector with four single-precision floating-point values (R,G, B, A) or (X, Y, Z, W), which are the formats most often used in 3D calculations. These quads then moved to an ALU, which was operating in 16-way SIMD mode – applying the same instruction to all 16 floating-point numbers. This is a simplification for the purpose of illustrating the principle; in practice GeForce 6 and 7 had a mode called co-issue for executing two instructions per vector.
Since the G80, this mode of operation has been reworked – the rasterizer still generates quads, which are placed in a buffer. When 8 quads (32 pixels, a "warp" in CUDA terminology) are present in the buffer, they can be executed by a multiprocessor in SIMD mode. So what’s the difference? It’s in how the data are organized: Instead of working on four vectors of four floating-point numbers organized like this: (R, G, B, A, R, G, B, A, R, G, B, A, R, G, B, A), the multiprocessors work on vectors of 32 floating-point numbers, each made up of a single component of each of the 32 threads:
(R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R) then (G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G, G) etc.
In SIMD programming, the first data alignment is called AoS (Array Of Structures), and the second SoA (Structure of Arrays). This second organization results in better performance. Provided there’s sufficient data to fill a vector, the processor behaves, from the programmer’s point of view, like a scalar processor since the SIMD units are always used at 100% regardless of the width of the data being processed. Conversely, AoS achieves peak performance only when the same instruction is applied to all four components of each vector.
- Previous page What about Direct3D 10.1?
- Next page Scalable Processor Array
- Quad SLI Vs. 3-Way SLI
- Comparative Component Charts
- Best Graphics Cards for the Money: June 08
- Radeon 3850 AGP Plus Single-Core CPU
- GPU vs. CPU Upgrade: Extensive Tests
- NForce 780a SLI Debuts Hybrid SLI
- How To Overclock Your Graphics Card
- PCI Express 2.0 Graphics Cards: How Much Extra Performance Do They...
- Nvidia GeForce 9800 GTX Review
- Nvidia GeForce 9800 GX2 Review
hmm not as big an improvement as i thought. will have to wait and see on the drivers improving the cards , but the 260 gtx seems to be the much better option given the price. still , will have to see what ati bring to the fray first. patience will be reflected in price i have no doubt.
frankly depressing, Me WANTS MRAW POWER!!!!
I am so disappointed. Now if AMD delivers on the dual GPU single memory rumour (2 GPUs on a single card but without the Crossfire problems) NVidia could have a serious problem.
Why have they tested this system with only 2Gb of RAM? If you're testing a GPU with 1Gb of VRAM, surely you'd have more installed?
They also have 2 conflicting prices on page 28.
For the 280GTX- $846 and $650;
For the 260GTX- $450 and $400
Wouldn't it have been more prudent to test against a 8800gtx ultra as this is still the single most powerfull card.
It might just be me but 66.5dBa is unbearable unless you have your PC locked away in a cupboard somewhere. This business of supplying substandard fans on very expensive cards is intolerable. Why don't they strike a deal with Zalman / Thermalright for example, and ship cards that are quiet / silent? I'm sure that people who have the money to buy a £500 GPU could afford £10 more for a better cooling solution that's included.
where is that 20W to 30W idle you are talking about? The least in the graph is 199W!
mi1ez: Probably the reason for just 2GB RAM was that it allowed Tom's to stick with 32-bit OS architecture. If they tried using more RAM they'd be stuck with 64-bit Bindows which would not be pretty - aside from really needing 8GB to give a big difference over 2GB in 32bit Vista, there's the slight issue of stable signed drivers, which these cards probably won't have for a while. Good luck trying to get Vista 64 to even "see" the cards! XD
jhoravi: that idle power would only come up on newer nVidia mobos as the card would be shut down entirely when idle and hand over to the integrated chip.
And was it me or was the Noise text copypasted over the Temperature text on the next page? Oops.
Lets try again Mr THG (uhhhm try getting your fraking website working plz)...
Now lets see this puppy in action:
http://www.evga.com/products/pdf/01G-P3-1289-AR.pdf
!!
Bob