deadlock

the one-way bridge

moving two files

struct Dir {
  mutex_t lock; HashMap entries;
};
void MoveFile(Dir *from_dir, Dir *to_dir, string filename) {
  mutex_lock(&from_dir->lock);
  mutex_lock(&to_dir->lock);
    
  Map_put(to_dir->entries, filename,
        Map_get(from_dir->entries, filename));
  Map_erase(from_dir->entries, filename);

  mutex_unlock(&to_dir->lock);
  mutex_unlock(&from_dir->lock);
}

Thread 1: MoveFile(A, B, “foo”)
Thread 2: MoveFile(B, A, “bar”)

moving two files: lucky timeline (1)

Thread 1 Thread 2
MoveFile(A, B, "foo") MoveFile(B, A, "bar")
lock(&A->lock);
lock(&B->lock);
(do move)
unlock(&B->lock);
unlock(&A->lock);
lock(&B->lock);
lock(&A->lock);
(do move)
unlock(&A->lock);
unlock(&B->lock);

moving two files: lucky timeline (2)

Thread 1 Thread 2
MoveFile(A, B, "foo") MoveFile(B, A, "bar")
lock(&A->lock);
lock(&B->lock);
lock(&B->lock
(do move) (waiting for B lock)
unlock(&B->lock);
lock(&B->lock);
lock(&A->lock
unlock(&A->lock);
lock(&A->lock);
(do move)
unlock(&A->lock);
unlock(&B->lock);

moving two files: unlucky timeline

Thread 1 Thread 2
MoveFile(A, B, "foo") MoveFile(B, A, "bar")
lock(&A->lock);
lock(&B->lock)
lock(&B->lock;stalled
(waiting for lock on B) lock(&A->lock;stalled
(waiting for lock on B) (waiting for lock on A)
   
(do move)unreachable (do move)unreachable
unlock(&B->lock)unreachable unlock(&A->lock);unreachable
unlock(&A->lock)unreachable unlock(&B->lock);unreachable

moving two files: dependencies

moving three files: dependencies

moving three files: unlucky timeline

deadlock with free space

deadlock with free space (unlucky case)

free space: dependency graph

deadlock with free space (lucky case)

lab next week

  • applying solutions to deadlock to classic dining philosphers problem

dining philosophers

deadlock

  • deadlock — circular waiting for resources


  • resource = something needed by a thread to do work

    • locks
    • CPU time
    • disk space
    • memory
  • often non-deterministic in practice

  • most common example: when acquiring multiple locks

deadlock requirements

  • mutual exclusion

    • one thread at a time can use a resource
  • hold and wait

    • thread holding a resources waits to acquire another resource
  • no preemption of resources

    • resources are only released voluntarily
    • thread trying to acquire resources can’t ‘steal’
  • circular wait

    • there exists a set \(\{T_1,\ldots,T_n\}\) of waiting threads such that

      • \(T_1\) is waiting for a resource held by \(T_2\)
      • \(T_2\) is waiting for a resource held by \(T_3\)
      • \(T_n\) is waiting for a resource held by \(T_1\)

how is deadlock possible?

Given list: A, B, C, D, E

RemoveNode(LinkedListNode *node) {
    pthread_mutex_lock(&node->lock);
    pthread_mutex_lock(&node->prev->lock);
    pthread_mutex_lock(&node->next->lock);
    node->next->prev = node->prev;
    node->prev->next = node->next;
    pthread_mutex_unlock(&node->next->lock);
    pthread_mutex_unlock(&node->prev->lock);
    pthread_mutex_unlock(&node->lock);
}
Which of these (all run in parallel) can deadlock?
A. RemoveNode(B) and RemoveNode(C)
B. RemoveNode(B) and RemoveNode(D)
C. RemoveNode(B) and RemoveNode(C) and RemoveNode(D)
D. A and C
E. B and C
F. all of the above
G. none of the above

how is deadlock — solution

RemoveNode(B) RemoveNode(C)
lock B lock C
lock A (prev) wait to lock B (prev)
wait to lock C (next)

With B and D — only overlap in in node C — no circular wait possible
(thread can’t be waiting while holding something other thread wants)

deadlock prevention techniques

abort and retry limits?

  • abort-and-retry

  • pthread’s mutexes:

    • pthread_mutex_trylock
    • pthread_mutex_timedlock
  • how many times will you retry?

moving two files: abort-and-retry

struct Dir { mutex_t lock; HashMap entries; };
void MoveFile(Dir *from_dir, Dir *to_dir, string filename) {
  while (true) {
    mutex_lock(&from_dir->lock);
    if (mutex_trylock(&to_dir->lock) == LOCKED) break;
    mutex_unlock(&from_dir->lock);
  }
    
  Map_put(to_dir->entries, filename, Map_get(from_dir->entries, filename));
  from_dir->entries.erase(filename);

  mutex_unlock(&to_dir->lock);
  mutex_unlock(&from_dir->lock);
}

Thread 1: MoveFile(A, B, “foo”); Thread 2: MoveFile(B, A, “bar”)

moving two files: lots of bad luck?

livelock

  • livelock: keep aborting and retrying without end


  • like deadlock — no one’s making progress

    • potentially forever
  • unlike deadlock — threads are not waiting

preventing livelock

  • make schedule random — e.g. random waiting after abort
  • make threads run one-at-a-time if lots of aborting
  • other ideas?

deadlock prevention techniques

stealing locks???

  • how do we make stealing locks possible


  • unclean: just kill the thread

    • problem: inconsistent state?
  • clean: have code to undo partial operation

    • some databases do this
  • won’t go into detail in this class

revocable locks?

try {
    AcquireLock();
    use shared data
} catch (LockRevokedException le) {
    undo operation hopefully?
} finally {
    ReleaseLock();
}

deadlock prevention techniques

deadlock prevention techniques

acquiring locks in consistent order (1)

MoveFile(Dir* from_dir, Dir* to_dir, string filename) {
  if (from_dir->path < to_dir->path) {
    lock(&from_dir->lock);
    lock(&to_dir->lock);
  } else {
    lock(&to_dir->lock);
    lock(&from_dir->lock);
  }
  ...
}

acquiring locks in consistent order (2)

  • often by convention, e.g. Linux kernel comments:
/*
 * ...
 * Lock order:
 *  contex.ldt_usr_sem
 *    mmap_sem
 *      context.lock
 */

/*
 * ...
 * Lock order:
 *   1. slab_mutex (Global Mutex)
 *   2. node->list_lock
 *   3. slab_lock(page) (Only on some arches and for debugging)
 * ...
 */
 

Backup slides

backup slides

deadlock versus starvation

  • starvation: one+ unlucky (no progress), one+ lucky (yes progress)

    • example: low priority threads versus high-priority threads
  • deadlock: no one involved in deadlock makes progress


  • starvation: once starvation happens, taking turns will resolve

    • low priority thread just needed a chance…
  • deadlock: once it happens, taking turns won’t fix

abort and retry limits?

  • abort-and-retry

  • pthread’s mutexes:

    • pthread_mutex_trylock
    • pthread_mutex_timedlock
  • how many times will you retry?

moving two files: abort-and-retry

struct Dir { mutex_t lock; HashMap entries; };
void MoveFile(Dir *from_dir, Dir *to_dir, string filename) {
  while (true) {
    mutex_lock(&from_dir->lock);
    if (mutex_trylock(&to_dir->lock) == LOCKED) break;
    mutex_unlock(&from_dir->lock);
  }
    
  Map_put(to_dir->entries, filename, Map_get(from_dir->entries, filename));
  from_dir->entries.erase(filename);

  mutex_unlock(&to_dir->lock);
  mutex_unlock(&from_dir->lock);
}

Thread 1: MoveFile(A, B, “foo”); Thread 2: MoveFile(B, A, “bar”)

moving two files: lots of bad luck?

livelock

  • livelock: keep aborting and retrying without end


  • like deadlock — no one’s making progress

    • potentially forever
  • unlike deadlock — threads are not waiting

preventing livelock

  • make schedule random — e.g. random waiting after abort
  • make threads run one-at-a-time if lots of aborting
  • other ideas?

stealing locks???

  • how do we make stealing locks possible


  • unclean: just kill the thread

    • problem: inconsistent state?
  • clean: have code to undo partial operation

    • some databases do this
  • won’t go into detail in this class

revocable locks?

try {
    AcquireLock();
    use shared data
} catch (LockRevokedException le) {
    undo operation hopefully?
} finally {
    ReleaseLock();
}

deadlock detection

  • why? debugging or fix deadlock by aborting operations
  • idea: search for cyclic dependencies

detecting deadlocks on locks

  • let’s say I want to detect deadlocks that only involve mutexes

    • goal: help programmers debug deadlocks
  • … by modifying my threading library:

struct Thread {
    ... /* stuff for implementing thread */
    /* what extra fields go here? */


};

struct Mutex {
    ... /* stuff for implementing mutex */
    /* what extra fields go here? */


};

deadlock detection

  • why? debugging or fix deadlock by aborting operations

  • idea: search for cyclic dependencies

  • need:

    • list of all contended resources
    • what thread is waiting for what?
    • what thread ‘owns’ what?

aside: divisible resources

  • deadlock is possible with divislbe resources like memory,…

  • example: suppose 6MB of RAM for threads total:

    • thread 1 has 2MB allocated, waiting for 2MB
    • thread 2 has 2MB allocated, waiting for 2MB
    • thread 3 has 1MB allocated, waiting for keypress
  • cycle: thread 1 waiting on memory owned by thread 2?

  • not a deadlock — thread 3 can still finish

    • and after it does, thread 1 or 2 can finish
  • … but would be deadlock

    • … if thread 3 waiting lock held by thread 1
    • … with 5MB of RAM

divisible resources: not deadlock

divisible resources: is deadlock

divisible resources: is deadlock

deadlock detection with divisible resources

  • for each resource: track which threads have those resources

  • for each thread: resources they are waiting for

  • repeatedly:

    • find a thread where all the resources it needs are available
    • remove that thread and mark the resources it has as free — it can complete now!
  • either: all threads eliminated or found deadlock

aside: deadlock detection in reality

  • requires:

    • instrumenting contended resources
    • ‘‘undo’’ to get out of deadlock

  • common example: for locks in a database

    • database typically has customized locking code
    • ‘‘undo’’ exists as side-effect of code for handling power/disk failures
  • related idea: avoid deadlock with detection on ‘‘what if’’ scenario

    • see Banker’s algorithm

pipe() deadlock

BROKEN example:

int child_to_parent_pipe[2], parent_to_child_pipe[2];
pipe(child_to_parent_pipe); pipe(parent_to_child_pipe);
if (fork() == 0) {
    /* child */
    write(child_to_parent_pipe[1], buffer, HUGE_SIZE);
    read(parent_to_child_pipe[0], buffer, HUGE_SIZE);
    exit(0);
} else {
    /* parent */
    write(parent_to_child_pipe[1], buffer, HUGE_SIZE);
    read(child_to_parent_pipe[0], buffer, HUGE_SIZE);
}

This will hang forever (if HUGE_SIZE is big enough).

deadlock waiting

  • child writing to pipe waiting for free buffer space
  • … which will not be available until parent reads
  • parent writing to pipe waiting for free buffer space
  • … which will not be available until child reads

circular dependency

allocating all at once?

  • for resources like disk space, memory


  • figure out maximum allocation when starting thread

    • ‘‘only’’ need conservative estimate
  • only start thread if those resources are available


  • okay solution for embedded systems?

deadlock with free space

deadlock with free space (unlucky case)

free space: dependency graph

deadlock with free space (lucky case)

AllocateOrFail

  • okay, now what?

    • give up?
    • both try again? — maybe this will keep happening? (called livelock)
    • try one-at-a-time? — guaranteed to work, but tricky to implement

AllocateOrSteal

  • problem: can one actually implement this?
  • problem: can one kill thread and keep system in consistent state?

fail/steal with locks

  • pthreads provides pthread_mutex_trylock — ‘‘lock or fail’’

  • some databases implement revocable locks

    • do equivalent of throwing exception in thread to ‘steal’ lock
    • need to carefully arrange for operation to be cleaned up

dining philosophers — ordering

dining philosophers — aborting

using deadlock detection for prevention

  • suppose you know the maximum resources a process could request

  • make decision when starting process (‘‘admission control’’)


  • ask ‘‘what if every process was waiting for maximum resources’’

    • including the one we’re starting
  • would it cause deadlock? then don’t let it start


  • called Banker’s algorithm