Research Article Open Access

Reliability Evaluation of Distributed Computer Systems Subject to Imperfect Coverage and Dependent Common-Cause Failures

Liudong Xing and Akhilesh Shrestha

Abstract

Imperfect coverage (IPC) occurs when a malicious component failure causes extensive damage due to inadequate fault detection, fault location or fault recovery. Common-cause failures (CCF) are multiple dependent component failures within a system due to a shared root cause. Both imperfect coverage and common-cause failures can exist in distributed computer systems and can contribute significantly to the overall system unreliability. Moreover they can complicate the reliability analysis. In this study, we propose an efficient approach to the reliability analysis of distributed computer systems (DCS) with both IPC and CCF. The proposed methodology is to decouple the effects of IPC and CCF from the combinatorics of the solution. The resulting approach is applicable to the computationally efficient binary decision diagrams (BDD) based method for the reliability analysis of DCS. We provide a concrete analysis of an example DCS to illustrate the application and advantages of our approach. Due to the consideration of IPC and CCF, our approach can evaluate a wider class of DCS as compared with existing approaches. Due to the nature of the BDD and the separation of IPC and CCF from the solution combinatorics, our approach has high computational efficiency and is easy to implement, which means that it can be easily applied to the accurate reliability analysis of large-scale DCS subject to IPC and CCF. The DCS without IPC or CCF appear to be special cases of our approach.

Journal of Computer Science
Volume 2 No. 6, 2006, 473-479

DOI: https://doi.org/10.3844/jcssp.2006.473.479

Submitted On: 8 February 2006 Published On: 28 September 2006

How to Cite: Xing, L. & Shrestha, A. (2006). Reliability Evaluation of Distributed Computer Systems Subject to Imperfect Coverage and Dependent Common-Cause Failures . Journal of Computer Science, 2(6), 473-479. https://doi.org/10.3844/jcssp.2006.473.479

  • 3,193 Views
  • 2,769 Downloads
  • 2 Citations

Download

Keywords

  • Distributed program reliability (DPR)
  • reduced ordered binary decision diagrams (ROBDD)
  • separable approach