Failure-aware resource management for high-availability computing clusters with distributed virtual machines
From MaRDI portal
Publication:666083
DOI10.1016/j.jpdc.2010.01.002zbMath1233.68055MaRDI QIDQ666083
Publication date: 7 March 2012
Published in: Journal of Parallel and Distributed Computing (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.jpdc.2010.01.002
resource management; cluster computing; system availability; component failures; distributed virtual machines; system reconfiguration
68M14: Distributed systems
Related Items
Quantifying event correlations for proactive failure management in networked computing systems, Software rejuvenation policies for cluster system, DEFT: dynamic fault-tolerant elastic scheduling for tasks with uncertain runtime in cloud, Scalable, Adaptable, and Fast Estimation of Transient Downtime in Virtual Infrastructures Using Convex Decomposition and Sample Path Randomization
Uses Software
Cites Work
- Implementing unreliable failure detectors with unknown membership
- The customizable fault/error model for dependable distributed systems.
- A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters
- Performance characteristics of the multi-zone NAS parallel benchmarks
- On the quality of service of failure detectors
- CRITICAL PATH SCHEDULING PARALLEL PROGRAMS ON AN UNBOUNDED NUMBER OF PROCESSORS