Download the Tom's Hardware App from the App Store
The reference for current tech news
Yes No

Memory Access And Prefetcher

by

Optimized Unaligned Memory Access

With the Core architecture, memory access was subject to several restrictions in terms of performance. The processor was optimized for access to memory addresses that were aligned on 64-byte boundaries—the size of one cache line. Not only was access slow for unaligned data, but execution of an unaligned load or store instruction was more costly than for aligned instructions, regardless of actual alignment of the data in memory. That’s because these instructions generated several µops for the decoders to handle, which reduced the throughput of this type of instruction. As a result, compilers avoided generating this type of instruction, by substituting sequences of instructions that were less costly.

Thus, memory reads that overlapped two cache lines took a performance hit of approximately 12 cycles, compared to 10 for writes. The Intel engineers have optimized these accesses to make them faster. First of all, there’s no performance penalty for using the unaligned versions of load/store instructions in cases where the data are aligned in memory. In other cases, Intel has optimized these accesses to reduce the performance hit compared to that of the Core architecture.

More Prefetchers Running More Efficiently

With the Conroe architecture, Intel was especially proud of its hardware prefetchers. As you know, a prefetch is a mechanism that observes memory access patterns and tries to anticipate which data will be needed several cycles in advance. The point is to return the data to the cache, where it will be more readily accessible to the processor while trying to maximize bandwidth by using it when the processor doesn’t need it.

This technique produced remarkable results with most desktop applications, but in the server world the result was often a loss of performance. There are many reasons for that inefficiency. First of all, memory accesses are often much less easy to predict with server applications. Database accesses, for example, aren’t linear—when an item of data is accessed in memory, the adjacent data won’t necessarily be called on next. That limits the prefetcher’s effectiveness. But the main problem was with memory bandwidth in multi-socket configurations. As we said earlier, there was already a bottleneck between processors, but in addition, the prefetchers added additional pressure at this level. When a microprocessor wasn’t accessing memory, the prefetchers kicked in to use bandwidth they assumed was available. They had no way of knowing at that precise point that the other processor might need the bandwidth. That meant the prefetchers could deprive a processor of bandwidth that was already at a premium in this kind of configuration. To solve the problem, Intel had no better solution to offer than to disable the prefetchers in these situations—hardly a satisfactory answer.

Intel says the problem is solved now, but provides no details on the operation of the new prefetch algorithms; all its says is that it won’t be necessary to disable them for server configurations. But even if Intel hasn’t changed anything, the gains stemming from the new memory organization and the resulting wider bandwidth should limit any negative impact of the prefetchers.

Share:
13
Comments
Read more
X
Submit

Comments
Read the comments on the forums
americanbrian 14/10/2008 10:36
Hide
-1+

While undoubtedly this will create a whole new level of performance. I imagine it will be prohibitively expensive. Coming in just as the global economy hits a trough.

For this reason I think AMD has a brighter future when it releases it's new 45nm cores. They will provide a good performance increase and I am willing to bet will still trump intel on the price/performace scale.

mi1ez 14/10/2008 11:05
Hide
-0+

Fantastic article, very insightful.

M_Taylor40 14/10/2008 11:37
Hide
-0+

First off, I have not read the entire article but I just want to comment on the name.
I've been saying this since they announced the design of Nehalem, its Intels take on AMD design, which means your getting the best of both companies as AMD designs have been so much better than Intel but AMD could not challenge what Intel already had.
It's been a long time coming for Intel to adopt AMD's designs but I really do look forward to the release (Well 6 months after when I might be able to afford a Core i7 system!), but feel AMD really needs to pull something out the hat to compete.
Anyways, from what I have read, its a good article lol.

goozaymunanos 14/10/2008 15:45
Hide
-0+

good...progress!

btw, where's the 8-core systems we were promised for 2008?

..and where's all the re complied apps to take advantage of all this processing parallelism?!



p.s. stuff and nonsense: http://www.eupeople.net/forum

ErikO 14/10/2008 17:28
Hide
-0+

My credit card is restless...

Anonymous 14/10/2008 19:04
Hide
-1+

just hope the bank is still around to honour your credit card... :D

bobwya 14/10/2008 19:14
Hide
-0+

Now that's more like it!! A well informed article, that is well written and imparts some useful information... More of the same please THG!!

I'm just off to sell those AMD shares...

Bob

jammydodger 15/10/2008 14:46
Hide
-0+

While the article is sound, it did upset me the at the first two pages talk about the 'Conroe' architecture. 'Core 2' is the name of the architecture used in the Conroe line of processors. 'Conroe' is the name given to the first desktop iteration of the core2 architecture, just as Allendale is the value version and Kentsfield the quad core version (along with all the new iterations that utilize different cache sizes or manufacturing process).

It is difficult to inspire confidence in your readers when such obvious mistakes are apparent.

KingGreatYat 16/10/2008 10:59
Hide
-0+

Jammydodger : I think the usage may be a little off, but to say the conroe architecture, just means the uarch used by the conroe chips - which is in common with all chips of the generation. Also, the architecture was refered to by the code name Merom . Core 2 is a retail brand name. Either way, this is a minor mistake and not something that would make me doubt the validity of the article.

szilu2002 16/10/2008 16:26
Hide
-0+

at last a quality oriented article!!!
Complete and detailed i want to see more in the future!

jammydodger 16/10/2008 23:22
Hide
-0+

KingGreatYat: I do realise that I could be seen to be splitting hairs, but when an article goes in to such detail about an upcoming processor architecture but begins the article by failing to recognise the distinction between an architecture and a core then it does raise the question of whether the writer has fully understood what it is that he is trying to impart upon us. If I were to begin an article by talking about intel's 'Northwood' architecture then I would be talking non-sense, Northwood was a chip based around Intel's 'Netburst' architecture. The Merom is, as far as I am aware, the first mobile variant of the Core2 architecture, it was proceeded by the Yonah based on Intel's 'Core' architecture, which was itself based on the 'P6' architecture.

krisna159 20/10/2008 08:14
Hide
-0+

competition is good for the market,end user like us have many choises to pick,AMD or intel.i agreed with americanbrian.lets wait the counter attack from AMD with the lates technologies n off course with lowest price.

geoffy 15/11/2008 19:27
Hide
-0+

[quote=Article]Intel says the problem is solved now, but provides no details on the operation of the new prefetch algorithms[/quote]

Something tells me this is going to be pivotal if Deneb proves to be any good...

Great article, by the way, minor niggles aside!

Best offers

Newsletters


OK