Cloud developers have to build applications that are resilient to failures and interruptions. We advocate for a fault-tolerant programming model for the cloud based on actors, retry orchestration, and tail calls. This model builds upon persistent data stores and message queues readily available on the cloud. Retry orchestration not only guarantees that (1) failed actor invocations will be retried but also that (2) completed invocations are never repeated and (3) it preserves a strict happen-before relationship across failures within call stacks. Tail calls can break complex tasks into simple steps to minimize re-execution during recovery. We review key application patterns and failure scenarios. We formalize a process calculus to precisely capture the mechanisms of fault tolerance in this model. We briefly describe our implementation. Using an application inspired by a typical enterprise scenario, we validate the functional correctness of our implementation and assess the impact of fault preparedness and recovery on performance.
Mon 19 JunDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 18:00 | PLDI: Concurrency & ParallelismPLDI Research Papers at Cypress 2 Chair(s): Calin Cascaval Google Research | ||
16:00 20mTalk | Type-Checking CRDT Convergence PLDI Research Papers George Zakhour University of St.Gallen, Pascal Weisenburger University of St. Gallen, Guido Salvaneschi University of St. Gallen DOI Pre-print | ||
16:20 20mTalk | Reliable Actors with Retry Orchestration PLDI Research Papers Olivier Tardieu IBM Research, David Grove IBM Research, Gheorghe-Teodor Bercea IBM Research, Paul Castro IBM Research, Jaroslaw Cwiklik IBM Research, Edward Epstein IBM Research DOI | ||
16:40 20mTalk | Dynamic Partial Order Reduction for Checking Correctness Against Transaction Isolation Levels PLDI Research Papers Ahmed Bouajjani IRIF, Université Paris Diderot, Constantin Enea LIX, CNRS, Ecole Polytechnique, Enrique Román-Calvo Université Paris Cité - CNRS - IRIF DOI | ||
17:00 20mTalk | Responsive Parallelism with Synchronization PLDI Research Papers Stefan K. Muller Illinois Institute of Technology, Kyle Singer Washington University in St. Louis, USA, Devyn Terra Keeney Illinois Institute of Technology, Andrew Neth Illinois Institute of Technology, Kunal Agrawal Washington University in St. Louis, USA, I-Ting Angelina Lee Washington University in St. Louis, USA, Umut A. Acar Carnegie Mellon University DOI | ||
17:20 20mTalk | Parallelism in a Region Inference Context PLDI Research Papers DOI | ||
17:40 20mTalk | Performal: Formal Verification of Latency Properties for Distributed Systems PLDI Research Papers Nuda Zhang University of Michigan, Upamanyu Sharma Massachusetts Institute of Technology, Manos Kapritsos University of Michigan, USA DOI |