TY - JOUR
T1 - Task allocation and reallocation for fault tolerance in Multicomputer Systems
AU - Chen, Chien In Henry
AU - Cherkassky, Vladimir
PY - 1994/10
Y1 - 1994/10
N2 - The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed here a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor systems.
AB - The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed here a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor systems.
UR - https://www.scopus.com/pages/publications/0028529446
UR - https://www.scopus.com/inward/citedby.url?scp=0028529446&partnerID=8YFLogxK
U2 - 10.1109/7.328753
DO - 10.1109/7.328753
M3 - Article
AN - SCOPUS:0028529446
SN - 0018-9251
VL - 30
SP - 1094
EP - 1104
JO - IEEE Transactions on Aerospace and Electronic Systems
JF - IEEE Transactions on Aerospace and Electronic Systems
IS - 4
ER -