Unreliable failure detectors for reliable distributed systems

DOI10.1145/226643.226647zbMath0885.68021OpenAlexW2133943294WikidataQ29302224 ScholiaQ29302224MaRDI QIDQ4371670

Publication date: 21 January 1998

Published in: Journal of the ACM (Search for Journal in Brave)

Full work available at URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.579.388

zbMATH Keywords

algorithms reliability unreliable failure detectors

Mathematics Subject Classification ID

Reliability, testing and fault tolerance of networks and computer systems (68M15)

Related Items

The expressive power of snap-stabilization ⋮ A necessary and sufficient condition for transforming limited accuracy failure detectors ⋮ From binary consensus to multivalued consensus in asynchronous message-passing systems ⋮ A visit to mutual exclusion in seven dates ⋮ Synchronous condition-based consensus ⋮ Dynamic group communication ⋮ The Theta-Model: achieving synchrony without clocks ⋮ The Heard-Of model: computing in distributed systems with benign faults ⋮ Consensus and collision detectors in radio networks ⋮ On the computability power and the robustness of set agreement-oriented failure detector classes ⋮ On implementing omega in systems with weak reliability and synchrony assumptions ⋮ Implementing unreliable failure detectors with unknown membership ⋮ A weakly-adaptive condition-based consensus algorithm in asynchronous distributed systems ⋮ A timing assumption and two $t$-resilient protocols for Implementing an eventual leader service in asynchronous shared memory systems ⋮ Light-weight leases for storage-centric coordination ⋮ A distributed leader election algorithm in crash-recovery and omissive systems ⋮ Fault-tolerant multiparty session types ⋮ Anonymous asynchronous systems: the case of failure detectors ⋮ The perfectly synchronized round-based model of distributed computing ⋮ In search of lost time ⋮ Algorithms for a distributed IDS in MANETs ⋮ Corona: a stabilizing deterministic message-passing skip list ⋮ The weakest failure detector to implement a register in asynchronous systems with hybrid communication ⋮ Power and limits of distributed computing shared memory models ⋮ Randomized $k$-set agreement in crash-prone and Byzantine asynchronous systems ⋮ Partial synchrony based on set timeliness ⋮ Failure detectors encapsulate fairness ⋮ Renaming and the weakest family of failure detectors ⋮ $\text{Para}^2$: parameterized path reduction, acceleration, and SMT for reachability in threshold-guarded distributed algorithms ⋮ Communication-efficient and crash-quiescent omega with unknown membership ⋮ Communication-efficient failure detection and consensus in omission environments ⋮ Uniform reliable broadcast in anonymous distributed systems with fair lossy channels ⋮ Consensus in the presence of mortal Byzantine faulty processes ⋮ \textsc{Ramos}: concurrent writing and reconfiguration for collaborative systems ⋮ Pronto: high availability for standard off-the-shelf databases ⋮ Message and time efficient consensus protocols for synchronous distributed systems ⋮ On termination detection in crash-prone distributed systems with failure detectors ⋮ Using asynchrony and zero degradation to speed up indulgent consensus protocols ⋮ A knowledge-theoretic analysis of uniform distributed coordination and failure detectors ⋮ Low complexity Byzantine-resilient consensus ⋮ Active disk Paxos with infinitely many processes ⋮ The inherent price of indulgence ⋮ Tight bounds for $k$-set agreement with limited-scope failure detectors ⋮ On the importance of having an identity or, is consensus really universal? ⋮ The weakest failure detector to solve nonuniform consensus ⋮ The overhead of consensus failure recovery ⋮ Transient fault detectors ⋮ Booting clock synchronization in partially synchronous systems with hybrid process and link failures ⋮ Failure detectors as type boosters ⋮ The weakest failure detectors to boost obstruction-freedom ⋮ Low-latency atomic broadcast in the presence of contention ⋮ When consensus meets self-stabilization ⋮ The renaming problem in shared memory systems: an introduction ⋮ Communication-optimal eventually perfect failure detection in partially synchronous systems ⋮ Verification of consensus algorithms using satisfiability solving ⋮ \textsc{Rambo}: a robust, reconfigurable atomic memory service for dynamic networks ⋮ Threshold protocols in survivor set systems ⋮ The disagreement power of an adversary ⋮ On set consensus numbers ⋮ Randomization can be a healer: consensus with dynamic omission failures ⋮ The minimum information about failures for solving non-local tasks in message-passing systems ⋮ A theory of system behaviour in the presence of node and link failure ⋮ Set-constrained delivery broadcast: a communication abstraction for Read/write implementable distributed objects ⋮ Gracefully degrading consensus and $k$-set agreement in directed dynamic networks ⋮ Consensus in anonymous asynchronous systems with crash-recovery and omission failures ⋮ On the complexity of basic abstractions to implement consensus ⋮ On mobile agent verifiable problems ⋮ Distributed fault detection and isolation of continuous-time non-linear systems ⋮ Consensus using omega in asynchronous systems with unknown membership and degenerative Byzantine failures ⋮ Consensus in rooted dynamic networks with short-lived stability ⋮ The weakest failure detector for eventual consistency ⋮ The impossibility of boosting distributed service resilience ⋮ An impossibility about failure detectors in the iterated immediate snapshot model ⋮ On the road to the weakest failure detector for $k$-set agreement in message-passing systems ⋮ An object based algebra for specifying a fault tolerant software architecture ⋮ The asynchronous bounded-cycle model ⋮ A simple and communication-efficient omega algorithm in the crash-recovery model ⋮ A simple proof of the necessity of the failure detector $\Sigma $ to implement an atomic register in asynchronous message-passing systems ⋮ Adaptive progress: a gracefully-degrading liveness property ⋮ Refined quorum systems ⋮ On the completeness of bounded model checking for threshold-based distributed algorithms: reachability ⋮ Fault-management in P2P-MPI ⋮ Implementing the Omega failure detector in the crash-recovery failure model ⋮ Safe termination detection in an asynchronous distributed system when processes may crash and recover ⋮ Reducing $\Omega$ to $\lozenge\mathcal W$ ⋮ Increasing the resilience of distributed and replicated database systems ⋮ A flexible formal framework for masking/demasking faults ⋮ Consensus in Byzantine asynchronous systems ⋮ Communication-efficient randomized consensus ⋮ On modelling mobility ⋮ Revisiting the PAXOS algorithm ⋮ Contention-related crash failures: definitions, agreement algorithms, and impossibility results ⋮ Asynchronous bounded lifetime failure detectors ⋮ Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks ⋮ On the round complexity of Byzantine agreement without initial set-up ⋮ Byzantine-resistant total ordering algorithms. ⋮ Making Byzantine consensus live ⋮ On the hardness of failure-sensitive agreement problems. ⋮ Restricted failure detectors: Definition and reduction protocols ⋮ Optimistic atomic broadcast: A pragmatic viewpoint ⋮ Extracting Symbolic Transitions from TLA$$^{+}$$+ Specifications ⋮ A case study on parametric verification of failure detectors ⋮ Survey on Parameterized Verification with Threshold Automata and the Byzantine Model Checker ⋮ Effective multicast programming in large scale distributed systems ⋮ Open consensus ⋮ Unnamed Item ⋮ On the weakest failure detector ever ⋮ What You Always Wanted to Know About Model Checking of Fault-Tolerant Distributed Algorithms ⋮ Communication Patterns and Input Patterns in Distributed Computing ⋮ A Separation of n-consensus and (n + 1)-consensus Based on Process Scheduling ⋮ A weakest failure detector-based asynchronous consensus protocol for $f<n$ ⋮ Correctness proof of a database replication protocol under the perspective of the I/O automaton model ⋮ Synthesis of distributed algorithms with parameterized threshold guards ⋮ An Eventually Perfect Failure Detector for Networks of Arbitrary Topology Connected with ADD Channels Using Time-To-Live Values ⋮ Implementing ♢P with Bounded Messages on a Network of ADD Channels ⋮ Secure consensus with distributed detection via two-hop communication ⋮ Agreeing within a few writes ⋮ Data-driven mixed-integer linear programming-based optimisation for efficient failure detection in large-scale distributed systems ⋮ Fault tolerant network constructors ⋮ Uniform atomic broadcast and consensus in fully anonymous synchronous systems with crash failures ⋮ The Iterated Restricted Immediate Snapshot Model ⋮ Self-stabilizing indulgent zero-degrading binary consensus ⋮ How to Solve Consensus in the Smallest Window of Synchrony ⋮ The Weakest Failure Detector for Message Passing Set-Agreement ⋮ Using Bounded Model Checking to Verify Consensus Algorithms ⋮ X-Ability: a theory of replication ⋮ Non-blocking atomic commit in asynchronous distributed systems with failure detectors ⋮ Handling message semantics with Generic Broadcast protocols ⋮ Hundreds of impossibility results for distributed computing ⋮ Randomized protocols for asynchronous consensus ⋮ Appraising two decades of distributed computing theory research ⋮ Condition-based consensus solvability: a hierarchy of conditions and efficient protocols ⋮ Distributed consensus, revisited ⋮ Eventually perfect predicate detection in crash-affected finite average response time systems ⋮ Formal Model–Driven Design of Distributed Algorithms ⋮ What Can be Computed in a Distributed System? ⋮ Tutorial on Parameterized Model Checking of Fault-Tolerant Distributed Algorithms ⋮ Genuine atomic multicast in asynchronous distributed systems ⋮ Experiences with object group systems ⋮ Deterministic Models of Communication Faults ⋮ Unnamed Item ⋮ Unnamed Item ⋮ An infrastructure to support cooperation of knowledge-level agents on the semantic grid ⋮ On Optimal Probabilistic Asynchronous Byzantine Agreement ⋮ Wait-Free Dining Under Eventual Weak Exclusion ⋮ The DHCP Failover Protocol: A Formal Perspective ⋮ Generalized Symmetry Breaking Tasks and Nondeterminism in Concurrent Objects ⋮ Packet efficient implementation of the Omega failure detector ⋮ Towards a Universal Approach for the Finite Departure Problem in Overlay Networks ⋮ Wait-freedom with advice ⋮ Perfect failure detection with very few bits ⋮ Snap-stabilizing tasks in anonymous networks ⋮ Structuring unreliable radio networks ⋮ Fault-Tolerant Consensus with an Abstract MAC Layer. ⋮ Characterizing Consensus in the Heard-Of Model ⋮ Fair Exchange Is Incomparable to Consensus ⋮ Unnamed Item ⋮ Much Ado About Nothing? ⋮ Simultaneous Consensus vs Set Agreement: A Message-Passing-Sensitive Hierarchy of Agreement Problems