Page 2:Test System
Page 3:CPU-Based PhysX: Relevance
Page 4:CPU PhysX: The x87 Story
Page 5:CPU PhysX: Multi-Threading?
Page 6:How To: His Majesty, Radeon The Fifth, And The PhysX Squire, GeForce
Page 7:GPU PhysX: Roundup: Free For All
Page 8:GPU PhysX: What Card Is Best?
Page 9:Summary And Conclusion
CPU PhysX: The x87 Story
CPU PhysX and Old Commands
In an interesting article by David Kanter at Real World Technologies, he explored using Intel’s VTune to analyze CPU-based PhysX. Looking at the results, he found loads of x87 instructions and x87 micro operations.
- Explanation: x87 is a small part of the x86 architecture’s instruction set used for floating point calculations. It is a so-called instruction set extension, a hardware implementation providing essential elements for solving common numerical tasks faster (sine and cosine calculations, for example). Since the introduction of the SSE2 instruction set, the x87 extension has lost much of its former importance. However, for calculations requiring a mantissa of 64 bits, only possible with the 80-bit wide x87 registers, x87 remains important.
David speculated that optimizing PhysX code using the more modern and faster SSE2 instruction set extension instead of x87 might make it run more efficiently. His assessment hinted at 1.3 to 2 times better performance. He also carefully noted that Nvidia would have nothing to gain from such optimizations, considering the company’s focus on people using its GPUs.
We reconstructed these findings using Mafia II instead of Cryostasis, and switching back to our old Intel-based test rig, since VTune unfortunately could/would not work with our AMD CPU.
Our own measurements fully confirm Kanter's results. However, the predicted performance increase from merely changing the compiler options is smaller than the headlines from SemiAccurate might indicate. Testing with the Bullet Benchmark only showed a difference of 10% to 20% between the x87- and SSE2-compiled files. This might seem like a big increase on paper, but in practice it’s rather marginal, especially if PhysX only runs on one CPU core. If the game wasn’t playable before, this little performance boost isn’t going to change much.
Nvidia wants to give a certain impression by enabling the SSE2 setting by default in its SDK 3.0. But ultimately it’s still up to developers to decide how and to what extent SSE2 will be used. The story above shows that there’s still potential for performance improvements, but also that some news headlines are a bit sensationalistic. Still, even after putting things in perspective, it’s obvious that Nvidia is making a business decision here, rather than doing what would be best for performance overall.