Follow-up: Has Intel found the key to unlock supercomputing powers on the desktop?
Source: Tom's Hardware – Keywords: followup, has, intel, the, key, to, desktop, supercomputing Category : Miscellaneous
Read on the next page : So, what problem does this solution really solve ?
mospagebreak
Some posters asked the question about what does this solution really solve ? How would its availability in software provide real speedups for common operations like encoding and decoding. I believe the MIMD (Multiple Instruction Multiple Data) would address that. Accelerator hardware could be used simultaneously to the CPU-based portion in codecs. This would allow workloads to be “shunted” to the accelerator hardware, thereby utilizing high-speed resources for computation while minimizing the CPU requirements.
Intel’s EXOCHI paper by Wang, et al, indicated a 141% to 1097% speedup in video and image processing tools. While this kind of integration is possible today using current software models, the limitations of this technology again stem back to driver support. With Intel’s proposed solution, the ability to include all required technology directly within x86-based binary executables result in a single-source solution which will operate on any platform.
Intel’s EXOCHI PDF lays down a solid explanation for the benefits and pitfalls of the EXOCHI model. It introduces or extends some existing basic concepts which allow the heterogeneous nature of disparate cores to work easily within the x86 ISA and with minimal OS intrusion. These are briefly explained here. For a full understanding, please refer to the 10-page PDF file, section 3.0.
First is the Exoskeleton. The Exoskeleton is a type of hardware wrapper which enables x86 to work with the accelerator solution using a different internal architecture or ISA. This allows the accelerator to communicate back and forth between the x86 CPU via various instructions. The advantage here is that it’s done directly by the application and not through OS service requests.
Next is an ability which enables the accelerators to process data on relevant blocks which the CPU itself might also be working on simultaneously. This is the Address Translation Remapping (ATR) mechanism. This device allows shared, virtual memory to be mapped correctly to physical memory via a translation mechanism between the CPU’s Translation Lookaside Buffers (TLBs). The mechanisms which keep the virtual addresses in synch are designed to work correctly, however there are no mechanisms for cache coherency between the accelerator and the x86 CPU. It is still the responsibility of the application developer to maintain cache coherency on critical sections or whenever cache coherency might become an issue. I believe mechanisms which address this will eventually be present in the architecture, though not initially - due primarily to development and testing time.
Lastly, we have Collaborative Exception Handling (CEH). With CEH the main x86 CPU will receive and process all interrupts caused by the accelerator hardware. This allows any fault occurring on the accelerator to be directed back to the x86 ISA for proper handling. The mechanics of how this operates are similar in scope to exception handling models today. The primary difference being that any replaying of the faulting instruction are handled by proxy through the CEH module from the x86 CPU’s exception handler algorithms.
The overview of this communication between the x86 CPUs and accelerators is shown here :
mosimage
In closing, the information Intel has now released publicly about this technology answers a lot of questions. It also raises a few more.
Intel has been able to demonstrate a working prototype using Core 2 processors coupled to a Graphics Media Accelerator X3000. Their tests provided a minimum of 41% speedup on video and image processing, with a maximum of 1097% speedup. These tests were conducted on non-integrated hardware which was emulating or mimicking the abilities EXOCHI will finally see, if implemented in hardware. As a result, we should see even greater speedup potential for all kinds of graphics-based algorithms, heavy FP computational algorithms and anything including workloads which can be broken down in parallel.
-
Previous News Article
Roadmap update: Intel aims to release... -
Next News Article
Samsung starts mass producing 64 GB...
- Citizen journalists get around Chinese censorship
- Bangkok airport officials say insufficient voltage killed their...
- Google, Ebay kiss, make up - sort of
- Apple patches Safari beta browser a second time
- Wi-fi aggregators lower hotspot prices
- Intel to launch new entry-level processors in August and October
- Hitachi develops mind over matter
- Google now controls more than 50% of search requests from Americans
- AMD to introduce 45 nm process AM3 CPU family in 2H08
- Creative follows up Zen Stone with souped-up version 2