Download the Tom's Hardware App from the App Store
The reference for current tech news
Yes No

Concurrent Programming: A Solution For Multi-core Era?

by - source: Tom's Hardware

Until now, programmers have been looking up to Moore’s law to speed-up their sequential program with each subsequent processor. But due to the physical limitations in chip design, the programmers’ "free lunch" is over : Multi-core architectures - multiple CPUs on a single chip - bring improved power efficiency, while each of the cores could be slower than the latest unicore processor when running traditional applications. This not only means that the acceleration of sequentially written applications may stagnate - they might even slow-down1. Rajesh Karmani discusses challenges, dead-ends and possible directions for developers.

One important question, of course, is what kinds of applications we can and need to parallelize. How much can we parallelize our word processor or the virus scanner ? Scientific applications, simulations, numerical computations have been the focus of parallel computing community for some time. But as parallel platforms (like multi-core CPUs) become mainstream, mainstream applications and their programmers have to rise to the challenge. It is an interesting discussion in itself and demands a separate article which I’ll leave aside for later.

Unfortunately, the problem gets worse when chips with hundreds of cores (sometimes called manycore) are rolled out, as is being widely predicted. Two potential consequences for programmers are :

1) Programs should continue to scale (run faster) with the increasing number of cores.
2) Programs based on shared memory model will suffer serious bottlenecks on memory bus.

Shared-memory concurrent programming (in Java)

Java is a widely-used programming language. It supports concurrent programming through threads and it seems that Java’s concurrency features are an additional feature instead of setting out with the goal of being an elegant concurrent programming language. This shows up in quite a few of its constructs. For example, consider the following statement in Java :

....
x++ ;
....

Depending on whether x is a local variable or class/instance variable, this statement gets compiled into 1 or 3 bytecode instructions. Crucially, in case x is an instance variable, the sequence of 3 instructions read the value from the heap, increment it, and write it back to the heap. If the code gets called from two different threads, there is a small window such that one increment is observed instead of two increments, which is incorrect. Of course, they told us in their specification what can go wrong, but I am arguing for constructs and abstractions which are simple, intuitive and elegant.

Similarly, the locking mechanism in Java is bug-prone as illustrated by a commonly referred example. This piece of code was part of the StringBuffer class :

public synchronized StringBuffer append (StringBuffer sb)

int len = sb.length() ;
...
...
sb.getChars(0,len,value,count) ;
...

Although the method is declared as synchronized, another thread can modify ’sb’ object after the first statement such that its length decreases to less than ’len’. Therefore, when getChars is called on ’sb’ with the old value of length, the program raises an exception.

Once again, the semantics of synchronized methods in Java are not intuitive enough. On entering a synchronized method, the runtime obtains a lock for the object on which the method is being called. These semantics guarantee isolation of this block of code (method body) with respect to other blocks capturing lock on the same object. This raises multiple problems ; what if the programmer forgets to obtain the lock at some place and what about the consistency of other objects (’sb’ here) being accessed (and modified) inside the synchronized method.

In order to fix this code, programmer should have obtained another lock on ’sb’ by using the synchronized block construct :

public synchronized StringBuffer append (StringBuffer sb)

synchronized (sb)

int len = sb.length() ;
...
...
sb.getChars(0,len,value,count) ;
...

But things are not as simple as they seem. Such an implementation could result in deadlock. Consider the following two threads :

There is a window where Thread 1 obtains the lock for sbObj and Thread 2 for sb. Each of the threads will keep waiting for the other thread to release the lock for respective object before it can proceed forward. In fact, the deadlock could occur trivially without the modification since the methods length() and getChars() are declared as synchronized in StringBuffer. For a more detailed discussion on this problem, see references 6 and 7 below.

To conclude, shared memory concurrent programs can have data races, which are conflicting accesses to shared variables. Low-level mechanism such as locks, semaphores provided in languages like C/C++/Java to prevent data-races put too much burden on programmers and yet can cause errors like deadlocks. Reference 8 details the problems with shared memory programs.

Read on the next page : Software Transactions, Conclusion

Software Transactions

Software transactions is an elegant mechanism proposed in order to reduce the complexity of concurrent programs in shared memory models. The idea is to annotate the blocks of code that are required to execute in isolation with the keyword atomic or transaction. The onus is on the compiler or run-time to obey the semantics using one of the few different mechanisms.

A pessimistic approach is to replace the atomic construct with locks acquisition statements that guarantee that there is no deadlock. Although it reduces the burden on programmer by taking care of the locks, it has certain disadvantages :

1) The approach is based on static code analysis, and tends to over-approximate the locks it needs to obtain. Hence, it restricts the potential parallelism in application.
2) It cannot guarantee isolation of atomic block with respect to non-atomic blocks. So, if the programmer forgets to annotate atomic, it can cause inconsistencies.
3) Composition of compiled code can still cause deadlocks.

Another approach taken by researchers is based on the conjecture that conflicts are timing-dependent and are usually rare in practice. Therefore, they execute the atomic blocks optimistically and before committing the results to shared memory, validate if there was a conflict with some other code. This approach is sometimes called TM. It can be implemented either in software (STM), hardware (HTM) or both (Hybrid). In case, a conflict is detected, the transaction is rolled back and executed again. Although, TM can provide more speed-ups in some cases, it can perform poorly in others. Some of its disadvantages are :

1) It’s difficult to roll back system and I/O calls. Hence these can’t be part of the atomic blocks.
2) It can perform poorly in case of frequent conflicts due to repeated roll backs.
3) There is a very high overhead in order to ensure isolation with respect to non-atomic code. In fact, with misunderstood constructs (such as increment operator discussed above) in the language, the behavior of the program is not very intuitive.

Software Transactions is an active research area due to the simplicity of its semantics but lot of questions remain unanswered at the implementation level. A good introduction can be found at 3,5. On the other hand, a heated debate on STM versus other models took place at Ltu4.

Conclusion

Mainstreaming of concurrent programming has woken up language designers. Efforts such as software transactions and library support in Java5 and Intel’s TBB are elegant workarounds, but essentially I would argue they are just patch-works. As multi-cores follow Moore’s law, the number of cores on a chip will grow exponentially and there will be serious bottlenecks at the shared memory bus. I believe it is not feasible to program based on a shared memory model. Moreover, in order to continue their "free lunch" in multicore era (scalability with number of cores instead of GHz), developers may well have to jilt their current love. But there is plenty of optimism, and we are yet to discuss message-passing models :-)

References :
[1] http://www.codinghorror.com/blog/archives/000942.html
[2] The Java Memory Model. Jeremy Manson, William Pugh and Sarita Adve. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2005. Long Beach, California. January, 2005.
[3] http://www.acmqueue.org/modules.php ?>[4] http://lambda-the-ultimate.org/node/2048
[5] "Beautiful concurrency" Simon Peyton Jones. This is a chapter for the book "Beautiful code", edited by Greg Wilson, O’Reilly 2007.
[6] http://bugs.sun.com/bugdatabase/view_bug.do ?bug_>[7] A. Williams, W. Thies, and M. D. Ernst. Static deadlock detection for Java libraries. In Proc. 2005 European Conference on Object-Oriented Programming (ECOOP) Lecture Notes in Computer Science. Springer-Verlag, July 2005.
[8] The Problem with Threads. Edward A. Lee. EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2006-1

Disclaimer : The views expressed in the article are author’s personal views and do not represent those of University of Illinois, or the UPCRC at the University of Illinois.

About the author : The author is a graduate student in the Department of Computer Science at University of Illinois at Urbana-Champaign. He is a recipient of the Sohaib and Sarah Abbasi Fellowship. His current area of interest is programming languages and software engineering. Previously, he has worked in wireless sensor networks and multi-agent systems.

Share:
2
Comments
Read more
X
Submit

Comments
Add your comment
JDocs 22/04/2008 15:05
Hide
-0+

Please note, Java can also synchronize on objects.

class ThreadA {
public static StringBuffer STR = new StringBuffer();

public void run() {
synchronized (ThreadA.STR) {
// Do something.
}
}
}

class ThreadB {
public void run() {
synchronized (ThreadA.STR) {
// Do something.
}
}
}

Now ThreadA.STR can only be accessed by ThreadA or ThreadB at a single time. Java does Threading excessively well when the programmer knows what to do.

windego 23/04/2008 14:06
Hide
-0+

Concurrency, or the more commonly known terminology, multithreading, has been around since the inception of software. It isn't some new design paradigm or rosetta stone of software design.

I have a number of problems with this article:

(1) "A solution for Multi-core Era" ? The problem you speak of resides in the hardware, id est, the CPU to be exact. CPU speeds are no longer increasing at a noticeable rate; instead, we are being told that parallelism is the future and that software needs to evolve; Which is of course a manifestly fallacious ideology; one which is driven by the inability of CPU manufacturers to match former increases in performance.

Most applications have no need for parallelism; indeed, the incorporation of multi-threading in an application greatly complicates design, implementation, and debugging. Especially so in languages such as C/C++ which the majority of applications are written with. A single main() thread is typically sufficient for most designs.

Superfluous concurrency causes bottlenecks, degradation of performance, and frequently, race conditions.

(2) Multiple cores are not a solution to stagnant growth in performance. Eventually, new technologies must be exploited in order to increase single process/thread performance of CPUs.

Don't get me wrong, multiple cores are a good idea, especially in a server environment. A desktop can also benefit from multi-core CPUs, but not in the way you would expect. When I say this, I am alluding to a concurrent model often overlooked, id est, the operating system scheduler. Modern schedulers can assign a program's execution to a given core; this enables discrete processes to run simultaneously; and in this sense, it is a welcome advancement. For the desktop, the question remains; how many cores are sufficient?; this is largely predicated on the types of applications on the system, but in general, two cores are more than sufficient for common desktops. With the possibility of four for heavy video encoding or 3D rendering (3ds Max). Anything more than this is superfluous.

Best offers

Newsletters


OK