Provide a detailed summary of the following web content, including what type of content it is (e.g. news article, essay, technical report, blog post, product documentation, content marketing, etc). If the content looks like an error message, respond 'content unavailable'. If there is anything controversial please highlight the controversy. If there is something surprising, unique, or clever, please highlight that as well: Title: Semaphores are Surprisingly Versatile (2015) Site: In multithreaded programming, it’s important to make threads wait. They must wait for exclusive access to a resource. They must wait when there’s no work available. One way to make threads wait – and put them to sleep inside the kernel, so that they no longer take any CPU time – is with a semaphore . I used to think semaphores were strange and old-fashioned. They were invented by Edsger Dijkstra back in the early 1960s , before anyone had done much multithreaded programming, or much programming at all, for that matter. I knew that a semaphore could keep track of available units of a resource, or function as a clunky kind of mutex , but that seemed to be about it. My opinion changed once I realized that, using only semaphores and atomic operations, it’s possible to implement all of the following primitives: A Lightweight Mutex A Lightweight Auto-Reset Event Object A Lightweight Read-Write Lock Another Solution to the Dining Philosophers Problem A Lightweight Semaphore With Partial Spinning Not only that, but these implementations share some desirable properties. They’re lightweight , in the sense that some operations happen entirely in userspace, and they can (optionally) spin for a short period before sleeping in the kernel. You’ll find all of the C++11 source code on GitHub . Since the standard C++11 library does not include semaphores, I’ve also provided a portable Semaphore class that maps directly to native semaphores on Windows, MacOS, iOS, Linux and other POSIX environments. You should be able to drop any of these primitives into almost any existing C++11 project. A Semaphore Is Like a Bouncer Imagine a set of waiting threads, lined up in a queue – much like a lineup in front of a busy nightclub or theatre. A semaphore is like a bouncer at the front of the lineup. He only allows threads to proceed when instructed to do so. Each thread decides for itself when to join the queue. Dijkstra called this the P operation. P originally stood for some funny-sounding Dutch word, but in a modern semaphore implementation, you’re more likely to see this operation called wait . Basically, when a thread calls the semaphore’s wait operation, it enters the lineup. The bouncer, himself, only needs to understand a single instruction. Originally, Dijkstra called this the V operation. Nowadays, the operation goes by various names, such as post , release or signal . I prefer signal . Any running thread can call signal at any time, and when it does, the bouncer releases exactly one waiting thread from the queue. (Not necessarily in the same order they arrived.) Now, what happens if some thread calls signal before there are any threads waiting in line? No problem: As soon as the next thread arrives in the lineup, the bouncer will let it pass directly through. And if signal is called, say, 3 times on an empty lineup, the bouncer will let the next 3 threads to arrive pass directly through. Of course, the bouncer needs to keep track of this number, which is why all semaphores maintain an integer counter . signal increments the counter, and wait decrements it. The beauty of this strategy is that if wait is called some number of times, and signal is called some number of times, the outcome is always the same: The bouncer will always release the same number of threads, and there will always be the same number of threads left waiting in line, regardless of the order in which those wait and signal calls occurred. 1. A Lightweight Mutex I’ve already shown how to implement a lightweight mutex in an earlier post . I didn’t know it at the time, but that post was just one example of a reusable pattern. The trick is to build another mechanism in front of the semaphore, which I like to call the box office . The box office is where the real decisions are made. Should the current thread wait in line? Should it bypass the queue entirely? Should another thread be released from the queue? The box office cannot directly check how many threads are waiting on the semaphore, nor can it check the semaphore’s current signal count. Instead, the box office must somehow keep track of its own previous decisions. In the case of a lightweight mutex, all it needs is an atomic counter. I’ll call this counter m_contention , since it keeps track of how many threads are simultaneously contending for the mutex. class LightweightMutex { private : std::atomic< int > m_contention ; Semaphore m_semaphore; When a thread decides to lock the mutex, it first visits the box office to increment m_contention . public : void lock() { if ( m_contention.fetch_add( 1 , std::memory_order_acquire) > 0 ) { m_semaphore.wait(); } } If the previous value was 0, that means no other thread has contended for the mutex yet. As such, the current thread immediately considers itself the new owner, bypasses the semaphore, returns from lock and proceeds into whatever code the mutex is intended to protect. Otherwise, if the previous value was greater than 0, that means another thread is already considered to own the mutex. In that case, the current thread must wait in line for its turn. When the previous thread unlocks the mutex, it visits the box office to decrement the counter: void unlock() { if ( m_contention.fetch_sub( 1 , std::memory_order_release) > 1 ) { m_semaphore.signal(); } } If the previous counter value was 1, that means no other threads arrived in the meantime, so there’s nothing else to do. m_contention is simply left at 0. Otherwise, if the previous counter value was greater than 1, another thread has attempted to lock the mutex, and is therefore waiting in the queue. As such, we alert the bouncer that it’s now safe to release the next thread. That thread will be considered the new owner. Every visit to the box office is an indivisible, atomic operation. Therefore, even if multiple threads call lock and unlock concurrently, they will always visit the box office one at a time. Furthermore, the behavior of the mutex is completely determined by the decisions made at the box office. After they visit the box office, they may operate on the semaphore in an unpredictable order, but that’s OK. As I’ve already explained, the outcome will remain valid regardless of the order in which those semaphore operations occur. (In the worst case, some threads may trade places in line.) This class is considered “lightweight” because it bypasses the semaphore when there’s no contention, thereby avoiding system calls. I’ve published it to GitHub as NonRecursiveBenaphore along with a recursive version . However, there’s no need to use these classes in practice. Most available mutex implementations are already lightweight . Nonetheless, they’re noteworthy for serving as inspiration for the rest of the primitives described here. 2. A Lightweight Auto-Reset Event Object You don’t hear autoreset event objects discussed very often, but as I mentioned in my CppCon 2014 talk , they’re widely used in game engines. Most often, they’re used to notify a single other thread (possibly sleeping) of available work. An autoreset event object is basically a semaphore that ignores redundant signals. In other words, when signal is called multiple times, the event object’s signal count will never exceed 1. That means you can go ahead and publish work units somewhere, blindly calling signal after each one. It’s a flexible technique that works even when you publish work units to some data structure other than a queue. Windows has native support for event objects, but its SetEvent function – the equivalent of signal – can be expensive. One one machine, I timed it at 700 ns per call, even when the event was already signaled. If you’re publishing thousands of work units between threads, the overhead for each SetEvent can quickly add up. Luckily, the box office/bouncer pattern reduces this overhead significantly. All of the autoreset event logic can be implemented at the box office using atomic operations, and the box office will invoke the semaphore only when it’s absolutely necessary for threads to wait. I’ve published the implementation as AutoResetEvent . This time, the box office has a different way to keep track of how many threads have been sent to wait in the queue. When m_status is negative, its magnitude indicates how many threads are waiting: class AutoResetEvent { private : std::atomic< int > m_status ; Semaphore m_sema; In the event object’s signal operation, we increment m_status atomically, up to the limit of 1: public : void signal() { int oldStatus = m_status.load(std::memory_order_relaxed); for (;;) { assert(oldStatus <= 1 ); int newStatus = oldStatus < 1 ? oldStatus + 1 : 1 ; if (m_status.compare_exchange_weak(oldStatus, newStatus, std::memory_order_release, std::memory_order_relaxed)) break ; } if (oldStatus < 0 ) m_sema.signal(); } Note that because the initial load from m_status is relaxed, it’s important for the above code to call compare_exchange_weak even if m_status already equals 1. Thanks to commenter Tobias Brüll for pointing that out. See this README file for more information. 3. A Lightweight Read-Write Lock Using the same box office/bouncer pattern, it’s possible to implement a pretty good read-write lock . This read-write lock is completely lock-free in the absence of writers, it’s starvation-free for both readers and writers, and just like the other primitives, it can spin before putting threads to sleep. It requires two semaphores: one for waiting readers, and another for waiting writers. The code is available as NonRecursiveRWLock . 4. Another Solution to the Dining Philosophers Problem The box office/bouncer pattern can also solve Dijkstra’s dining philosopher