Writing concurrent code that is both correct and efficient is notoriously difficult. Thus, programmers often prefer to use synchronization abstractions, which render code simpler and easier to reason about. Despite a wealth of work on this topic, there is still a gap between the rich semantics provided by synchronization abstractions in modern programming languages—specifically, \emph{fair} FIFO ordering of synchronization requests and support for \emph{abortable} operations—and frameworks for implementing it correctly and efficiently.
Supporting such semantics is critical given the rising popularity of constructs for asynchronous programming, such as coroutines, which abort frequently and are cheaper to suspend and resume compared to native threads.

This paper introduces a new framework called \texttt{CancellableQueueSynchronizer} (CQS),
which enables simple yet efficient implementations of a wide range of fair and abortable synchronization primitives: mutexes, semaphores, barriers, count-down latches, and blocking pools.
Our main contribution is algorithmic, as implementing both fairness and abortability efficiently at this level of generality is non-trivial.
Importantly, all our algorithms, including the CQS framework and the primitives built on top of it, come with \emph{formal proofs} in the Iris framework for Coq for many of their properties. These proofs are modular, so it is easy to show correctness for new primitives implemented on top of CQS.
From a practical perspective, implementation of CQS for native threads on the JVM improves throughput by up to two orders of magnitude over Java's \texttt{AbstractQueuedSynchronizer}, the only practical abstraction offering similar semantics.
Further, we successfully integrated CQS as a core component of the popular Kotlin Coroutines library, validating the framework's practical impact and expressiveness in a real-world environment.
In sum, \texttt{CancellableQueueSynchronizer} is the first framework to combine expressiveness with formal guarantees and solid practical performance. Our approach should be extensible to other languages and families of synchronization primitives.