Threads
1. Threads
Threads are execution streams that share an address space. When multithreading is used, each process can contain one or more threads:
| Per Process | Per Thread |
|---|---|
| Address Space | Program Counter |
| Global Variables | Registers |
| Open Files | Stack |
| Child Processes | |
| Signals |
Many applications contain multiple activities which execute in parallel, access and process the same data, some of which may block. Processes are too heavyweight:
- Difficult to communicate between address spaces.
- Process that blocks may switch out the entire application.
- Expensive to context switch between prcesses.
- Expensive to create / destroy processes.
However, threads come with problems / concerns:
- Memory corruption: One thread can write to another thread's stack.
- Concurrency Bugs: Concurrent accecss to shared data (e.g. global variables).
1.1 Case Study: PThreads
POSIX Threads (PThreads) are implemented by most unix systems:
1#include <pthread.h> 2#include <sys/types.h> 3 4pthread_t thread; // A type representing a thread. 5pthread_attr_t attr; // A type representing thread attributes. 6 7int pthread_create( 8 pthread_t *thread, // Where to store the newly created thread. 9 const pthread_attr_t *attr, // Where to store the thread attributes. 10 void *(*start_routine)(void *), // The function to run in the new thread. 11 void *arg // The argument to pass to the function. 12); 13 14int pthread_exit( 15 void *value_ptr // The value to return to the parent thread. 16);
pthread_exit terminates the thread and makes value_ptr available to any successful join with the terminating thread. Called implicitly when the thread's start routein returns (but not for the initial thead which started main()).
An example of a thread can be seen below:
1#include <pthread.h> 2#include <stdio.h> 3#include <unistd.h> 4 5void *thread_work(void *tid) { 6 long id = (long) tid; 7 printf("Thread %ld is running\n", id); 8} 9 10int main(int argc, int *argv[]) { 11 pthread_t threads[5]; 12 long t; 13 for (t = 0; t < 5; t++) { 14 pthread_create(&threads[t], NULL, thread_work, (void *) t); 15 } 16 sleep(10); 17}
1.2 Yielding & Joining
int pthread_yield(void) releases the CPU to let another thread run. Returns 0 on success, or an error code. Always succeeds on Linux.
int pthread_join(pthread_t thread, void **value_ptr) is the equivalent of waitpid for threads. It blocks until the thread to terminate and stores the return value in value_ptr. Returns 0 on success, or an error code. For example:
1#include <pthread.h> 2#include <stdio.h> 3 4long a, b, c; 5void *work1(void *x) { a = (long) x * (long) x; } 6void *work2(void *x) { b = (long) x * (long) x; } 7 8int main(int argc, int *argv[]) { 9 pthread_t t1, t2; 10 pthread_create(&t1, NULL, work1, (void *) 3); 11 pthread_create(&t2, NULL, work2, (void *) 4); 12 pthread_join(t1, NULL); 13 pthread_join(t2, NULL); 14 c = a + b; 15 printf("3^2 + 4^2 = %ld\n", c); 16}
2. Thread implementation
Threads may be either:
- User-level threads: kernel is not aware of the threads. Each process manages its own threads.
- Kernel-level threads: managed by the OS kernel.
Hybrid approaches are also possible.
2.1 User-Level Threads
OS kernel thinks it is managing process only. Threads are implemented by a software library. The process maintains a thread table for thread scheduling.
Advantages:
- Thread creation and termination is fast.
- Thread switching is fast.
- Thread synchronization is fast.
- These operations do not require any kernel involvement.
- Each application may have its own scheduling algorithm.
Disadvantages:
- Blocking system calls stops all threads in process.
- Non-Blocking I/O can be used, which is harder to use and understand.
- During page fault, OS blocks whole process, but other threads may be runnable.
2.2 Kernel-Level Threads
Advantages:
- Blocking system calls can be easily accommodated.
Disadvantages:
- Thread creation and termination is expensive, requiring syscalls (still cheaper than processes). One mitigation strategy is to recycle threads with thread pools.
- Thread synchronization is expensive, requiring blocking syscalls.
- Thread switching is expensive, requiring a syscall (still cheaper than processes).
- No application-specific scheduling algorithm.
2.3 Hybrid Threads
Hybrid approaches use kernel threads and multiplexuser-level threads onto some (or all) kernel threads.