Task allocation and reallocation for fault tolerance in Multicomputer Systems

Chien In Henry Chen, Vladimir Cherkassky

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed here a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor systems.

Original languageEnglish (US)
Pages (from-to)1094-1104
Number of pages11
JournalIEEE Transactions on Aerospace and Electronic Systems
Volume30
Issue number4
DOIs
StatePublished - Oct 1994

Fingerprint Dive into the research topics of 'Task allocation and reallocation for fault tolerance in Multicomputer Systems'. Together they form a unique fingerprint.

Cite this