Fog Creek Software
Discussion Board




Deadlock

Is there a standard way of preventing deadlock when developing multi-threaded applications?

Better still, are there any essential textbooks on the subject of multi-threading or real-time systems?

FG
(looking forward to the weekend :)

Furious George
Friday, May 21, 2004

Acquire locks in a predictable sequence: for example, if a thread can acquire lockA and then lockB, it's OK if another thread only acquires lockB, or only acquires lockA, but not OK if another thread can acquire lockB and then lockA.

To implement this it's easier if the software is layered without cyclic dependencies: for example, if lockB is at a lowest level, and lockA is at a higher level, then "no cyclic dependencies" means "lower-level code never calls higher-level code". "Callbacks" tend to break this model (because, with callbacks, lower-level code invokes higher-level code). So, when you make a callback from lower-level code, it's best to deacquire any locks before you invoke the callback.

Christopher Wells
Friday, May 21, 2004

Why wouldn't a semephore or mutex work?

There is another thread going on right now talking about Linux/Minux and Andrew T. (can't remember the spelling of his last  off the top of my heard), I think it is the FUD post.  In any case "Modern Operating Systems" is a book written by Andrew T. (who developed Minux as a 'sample' OS) that you may want to check out.

Randal Ulyess Serious
Friday, May 21, 2004

Draw a UML sequence type diagram showing your threads and your locks.  Identify which threads take which locks and draw them out. You need pencil and paper, not CASE tools.  Sit in a quiet room with only your printed source code, paper and pencil.  If you're careful, you'll figure it out.

hoser
Friday, May 21, 2004

Yes, it needs to be done at design-time: there's no guarantee that a finite amount of testing will detect every potential deadlock. That said, test on a multi-CPU machine (on which deadlocks are more easily triggered, because more threads are really executing simultaneously).

I also instrumented my locks: so that as each lock was acquired by a thread, there were thread-specific run-time lists of the locks acquired by each thread. This helped to instrument thread performance ("how long does each thread spend blocked, acquiring various locks?"). Also, before acquiring a lock, I checked to see whether the lock was already acquired by a thread which was blocked on a lock owned by this thread: and if so, I threw an exception (deadlock detection and recovery).

Christopher Wells
Friday, May 21, 2004

As complexity increases you won' t be able to mentally
handle preventing deadlock. It must really be
handled at an archictectural level. Java, for example,
picked the worst concurrency model: arbitrary threads
and locks.

For more discussion take a look at
http://www.possibility.com/epowiki/Wiki.jsp?page=ArchitectureDiscussion.

anoymous
Friday, May 21, 2004

Timeouts.

Dino
Friday, May 21, 2004

Check out the book "Advanced Windows" by Jeffrey Richter- Excellent book on the subject of threads, syncronization, semaphores, mutexes, etc...

http://www.amazon.com/exec/obidos/tg/detail/-/1572315482/qid=1085157242/sr=1-1/ref=sr_1_1/104-7203252-2283948?v=glance&s=books

Mike

MikeG
Friday, May 21, 2004

Christopher Wells wrote: "Also, before acquiring a lock, I checked to see whether the lock was already acquired by a thread which was blocked on a lock owned by this thread".

This is good. A stricter way of checking is to come up with a list of which synchronization objects cannot be acquired before which other ones. This forms a partial order (which is, by definition, acyclic).

An example: if you have synch objects A, B, C and D, your rules might be "B, C, and D cannot be acquired before A; C and D cannot be acquired before B". Then, in the one piece of code which takes care of grabbing A, put in a check (via an ASSERT or similar) that the current thread doesn't currently have B, C, or D grabbed. In the B-grabbing code, check that the current thread doesn't have C or D grabbed.

Then, whenever you run a code path, if you wind up violating your order, you'll get an assertion which indicates that your design is broken.

A project I worked on followed this method and found several surprising potential deadlocks which we then fixed. None have turned up since then.

Exception guy
Friday, May 21, 2004

Assume A and B processes. In order to deadlock A and B we must have:
- in process A: wait on B
- in process B: wait on A

It is tempting to try and fix to the deadlock situation by testing the state of the process we'll be waiting on:
- in process A: if state of B is not "wait on A" then wait on B
- in process B: if state of A is not "wait on B" then wait on A

However, testing and entering the wait state do not make an atomic operation; therfore there is still enough space for getting a race condition. This approach will only make the deadlock less probable (and harder to find).

Also, in this approach processes are tightly coupled (communication via stateful protocols). Needless to say, this is not desirable because it destabilizes your design. Just try to modify the above code for the case when B has to wait on A and C, while C has to wait on A.

A robust alternative to the above problems is to use timeouts:
- in process A: wait x msec on B
- in process B: wait y msec on A;

There are no race conditions since you're performing an atomic kernel operation (ie wait x msec).

The communication between A and B is stateless (A does not know B's state and B does not know A's state).  If B has to wait on C which wait on A then the code would look like:

- in process A: wait x msec on B;
- in process B: wait y msec on A and C;
- in process C: wait z msec on A;

Cheers

Dino
Friday, May 21, 2004

(Slightly off-topic, but it was such a pain to track down that I feel compelled to share.)

Interestingly, in Java, you can claim all locks in the same order and *still* get deadlocks.  Consider the following code:

synchronized(A) {
  synchronized(B) {
    A.wait();
  }
}

This piece of code can deadlock because A.wait() gives up the lock on A without giving up the lock on B.  So while it looks like the executing thread is claiming lock A and then lock B, it's actually doing this:
- claim A
- claim B
- give up A
- reclaim A
- give up B
- give up A

This sequence is clearly susceptible to deadlock, but it's not obvious that the code above causes this sequence of operations.  Except if you've seen it before...

(Disclaimer:  the code in which I found this particular deadlock condition was not nearly this simple.  Yes, it's bad code, so "just don't write code that way" is a perfectly reasonable response, but it can be very non-obvious when you're writing similar code, if the synchronized blocks span classes and methods, and lock on objects that are passed in.)

schmoe
Friday, May 21, 2004

(Change "A.wait()" above to "A.wait(100)" to avoid the discussion about whether this is deadlock or starvation.  If the "A.wait(...)" line ever returns, this is actually deadlock and not starvation.)

schmoe
Friday, May 21, 2004

In C and C++, just never lock something then call out to another method or function.

Ron
Friday, May 21, 2004

There's Magee/Kramer's Concurrency book, which I've heard is good but haven't looked at closely enough.

Maybe Andrews' book too, though I definitely haven't read through that either. Apparently it's more formal than the other one.

Tayssir John Gabbour
Saturday, May 22, 2004

+1 for Magee/Kramer. It's an awesome book that should be mandatory reading for every serious developer, regardless of what environment they're in.

Brad Wilson (dotnetguy.techieswithcats.com)
Saturday, May 22, 2004

*  Recent Topics

*  Fog Creek Home