Analytic models for the primary site approach to fault-tolerance (Q1108774)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Analytic models for the primary site approach to fault-tolerance |
scientific article; zbMATH DE number 4068244
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | Analytic models for the primary site approach to fault-tolerance |
scientific article; zbMATH DE number 4068244 |
Statements
Analytic models for the primary site approach to fault-tolerance (English)
0 references
1989
0 references
A common approach for supporting fault tolerance against node failures is the primary site approach. In this approach the service to be made fault- tolerant is replicated at many nodes, one of which is designated as primary and the others as backups. All the requests for the service are sent to the primary site. The primary site periodically checkpoints its state on the backups. If the primary fails, one of the backups takes over as primary, and to maintain consistency, it first re-executes all the requests performed by the previous primary since the last checkpoint. Two important issues that effect performance of this approach are the frequency of checkpointing and the degree of replication of the service. If the checkpointing interval is decreased the overhead of reexecuting old requests decreases, but the overhead for checkpointing increases. If the degree of replication increases, on the one hand, the availability of the system for user services increases since the reliability of the system increases. On the other hand, the checkpointing time increases, which reduces the availability of the system. In this paper, we present an analytic model to study the optimum checkpointing interval, and a queuing model to study the optimum degree of replication for a service in a primary site system. The reliability of a primary site system is also studied.
0 references
checkpoints
0 references
availability
0 references
queuing
0 references
reliability
0 references
primary site fault- tolerant system
0 references
recovery
0 references
machine repairman model
0 references
0.86334753
0 references
0.8499958
0 references
0.83899516
0 references