Changelog:

25 March 2023: add description of monitors API
7 Apr 2023: correct pthread_barrier_init call having wrong number of arguments; mention most pthread_…_destroy functions

This document is intended to give a practical overview of using pthreads. It leaves out a lot of details.

For brevity, code in this document does not check error codes. This is bad coding practice! You should check error codes in your code.

1 Jumping-off point

For those eager to start coding, here’s a parallel summation program you might find to be a useful starting point.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

/* Allow compiler -DTHREADCOUNT=4 but have a default */
#ifndef THREADCOUNT
#define THREADCOUNT 16
#endif

/** Defines a particular task to handle */
typedef struct {
    size_t from;
    size_t to;
    double (*getnum)(size_t);
} task_description;

/**
 * Function invoked by each new thread.
 * The argument must be a task_description *;
 * the return value is a malloced double *.
 */
void *sum_array(void *args) {
    task_description *task = (task_description *)args;
    printf("  Summing from %zd to %zd...\n", 
        task->from, task->to);
    double work = 0;
    for(size_t i=task->from; i<task->to; i+=1) {
        work += task->getnum(i);
    }
    printf("  ... sub-sum from %zd to %zd = %g\n", 
        task->from, task->to, work);
    double *sum = malloc(sizeof(double));
    *sum = work;
    return (void *)sum;
}

/** A simple reciprocal function */
double fraction(size_t i) {
    return 1.0/(i+1);
}

/**
 * Sum all fractions 1/n from 1 to pow(2,-28)
 * in THREADCOUNT parallel threads
 */
int main(int argc, const char *argv[]) {
    // set up task sizes to take a few seconds on 2020-era laptops
    size_t max = 1<<28;
    size_t step = max / THREADCOUNT;
    
    // store per-thread information (don't re-use, memory is shared)
    pthread_t id[THREADCOUNT];
    task_description tasks[THREADCOUNT];
    
    // spawn the threads
    for(int i=0; i<THREADCOUNT; i+=1) {
        tasks[i].from = i*step;
        tasks[i].to = (i+1)*step;
        tasks[i].getnum = fraction;
        pthread_create(id+i, NULL, sum_array, tasks+i);
    }

    // wait for and combine the results
    double result = 0;
    for(int i=0; i<THREADCOUNT; i+=1) {
        void *ans;
        pthread_join(id[i], &ans);
        result += *(double *)ans;
        free(ans); // was malloced in just-joined thread
    }

    printf("The sum is %g\n", result);
    return 0;
}

To see the time impact of threading, try comparing

clang -lpthread -O3 -DTHREADCOUNT=8 thiscode.c && time ./a.out

and

clang -lpthread -O3 -DTHREADCOUNT=1 thiscode.c && time ./a.out

2 Managing thread existence

Every process has at least one thread, the one that invoked main. Each other thread is created by invoking a system call, wrapped by the various pthread_ library functions.

2.1 `pthread_create`

The library function pthread_create makes a new thread. It is given four arguments:

Type	Kind	Purpose
`pthread_t *`	output	Set to the ID of the created thread
`const pthread_attr_t *`	input	Rules about how the new thread will behave
`void ()(void *)`	input	Pointer to a function the new thread runs
`void *`	input	Value passed as argument to the new thread

2.1.1 `pthreads_attr_t`

The second argument of pthread_create is used to control how the thread behaves. Much of this is fairly specialized, and passing NULL will work in many cases. If you want more control, though, you need to use a few extra functions to gain such.

A pthread_attr_t must be initialized using pthread_attr_init before invoking pthread_create and cleaned up using pthread_attr_destroy afterwards. pthread_attr_init is permitted to malloc fields inside the pthread_t, and if it did, pthread_attr_destroy will free them.

The following is a skeleton of how to create a thread.

pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_t id;
void *argument = /* ... initialize this here */
/* ... use various pthread_attr_setXXXX functions to set behavior ... */
pthread_create(&id, &attr, run_this, argument);
pthread_attr_destroy(&attr);

Many of the attributes that can be placed into a pthread_attr_t have to do with scheduling priority (how often the thread gets CPU time) and stack organization (how large, etc. the thread’s stack is) and can be ignored by the casual thread user. However, one (the detached or joinable state of the thread) is important enough to deserve its own section.

2.2 `pthread_attr_setdetachstate` and `pthread_join`

Every created thread is either detached or joinable.

pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE): A joinable thread (the default if not otherwise set) will, when it exits, continue to exist until pthread_join is called to retrieve its exit status and reclaim its resources.
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED): A detached thread will, when it exists, immediately be reclaimed by the OS. Its exit status is lost, and other threads cannot use pthread_join to wait for it to terminate.

2.2.1 Return values

If a thread with pthread_t id is joinable, then invoking pthread_join(id, &retval) will cause the invoking thread to suspend operation until thread id terminates; when thread id terminates, retval stores the results of the thread’s computation.

The result is one of

The return value of the start function of the terminated thread.
The argument passed into pthread_exit to terminate the thread early.
The special value PTHREAD_CANCELED if the thread was stopped by another thread.

If you have a joinable thread, you need to join it before exiting. If you have detached threads, you cannot wait for them to terminate and the program will exit when the main thread does.

2.3 Crashing

What happens if one thread crashes? Since a crash means an unhandled signal, and since the behavior of an unhandled signal is to terminate the process, the whole program crashes.

However, signals are delivered to specific threads, so if you add a signal handler, it will be run by the thread that the OS believes is the recipient of the signal.

2.4 Debugging

Debugging threaded programs can be tricky. Debuggers like lldb work fine on multithreaded programs, but with multiple threads there is more information to display. Additionally, some bugs (e.g., race conditions and deadlock) can depend on exact instruction scheduling, which may be different in a debugger than when run normally.

We will not have time in this course to dive into multithreaded debugging in any great detail.

3 Synchronization with pthreads

3.1 Mutex

Recall that a mutex only lets one thread have it locked at a time, excluding others until it is unlocked.

pthread_mutex_t mutex;

void *thread_function(void *) {
    /* ... */
    pthread_mutex_lock(&mutex);
    /* only one thread can get here at a time */
    pthread_mutex_unlock(&mutex);
    /* ... */
}

int main(int argc, const char *argv[]) {
    /* ... */
    pthread_mutex_init(&mutex, NULL);
    for(int i=0; i<THREADCOUNT; i+=1) {
        /* ... */
        pthread_create(&id[i], NULL, thread_function, &arg[i]);
    }
    /* ... */
}

3.2 Barrier

Recall that a barrier acts like a meet-up: no one moves until everyone expected arrives. In pthreads, that everyone criterion is determined by a count: once that number of threads arrive, all will be allowed to proceed.

pthread_barrier_t barrier;

void *thread_function(void *) {
    /* ... */
    /* each thread reaches here at its own time */
    pthread_barrier_wait(&barrier);
    /* all threads proceed from here together */
    /* ... */
}

int main(int argc, const char *argv[]) {
    /* ... */
    pthread_barrier_init(&barrier, NULL, THREADCOUNT);
    for(int i=0; i<THREADCOUNT; i+=1) {
        /* ... */
        pthread_create(&id[i], NULL, thread_function, &arg[i]);
    }
    /* ... */
}

See also pthread_barrier_destroy for destroying barriers (freeing any resources allocated by pthread_barrier_init). Note that less recent versions of pthread_barrier_destroy (such as on portal/our testing machines as of 2023) require that all calls to pthread_barrier_wait have completed (returned).

3.3 Reader-writer lock

Recall that a reader-writer lock has two modes: either it can be used by exactly one writer (like a mutex) or by any number of readers (like unsynchronized data), but not both at once.

The relevant functions are documented in the following manual pages:

pthread_rwlock_init – this is complicated because they have multiple attributes to handle how they handle if a writer is waiting for the readers to finish and a new reader arrives.
pthread_rwlock_rdlock and pthread_rwlock_wrlock – acquire the lock in two different ways.
pthread_rwlock_timedrdlock and pthread_rwlock_timedwrlock – try to acquire the lock, but if a specified time passes without success, return an error code instead.
pthread_rwlock_unlock – release the lock (no matter how it was acquired).
pthread_rwlock_destroy – free any resources allocated by pthread_rwlock_init

However, these details, aside from the overall usage, looks similar to how mutexes are used with pthreads.

3.4 Monitors

Pthreads supplies separate condition variables that can be used with a mutex to construct a monitor.

The relevant functions are documented in the following manpages:

pthread_cond_init – create a condition variable
pthread_cond_wait – atomically unlock a mutex and add the current thread to the condition monitor’s wait queue. When the thread is woken up from the wait queue, lock the mutex again before returning.
pthread_cond_timedwait — like pthread_cond_wait, but stop waiting on the wait queue if a specified time passes and return an error
pthread_cond_destory – free any resources allocated by pthread_cond_init