Progress in supercomputer technology leads to two major trends. First, many of the existing algorithms will need to be redesigned for efficient concurrent implementation using supercomputers. Second, a continuous increase will be apparent in the number of application-specific VLSI integrated circuits, which can provide the performance of supercomputers, using single chips or chipsets (at the expense of design time for algorithm and architecture development). Both of these approaches require considerable efforts in the development of algorithms for specific applications. This paper reviews four independent algorithm transformation methodologies: program unfolding, retiming, lookahead algorithms, and index mapping transformations. These transformation techniques exploit the available parallelism in iterative dataflow programs and create additional parallelism if necessary.
Bibliographical noteFunding Information:
Concurrent processors provide the computing power needed to solve many computation-intensive problems, such as those found in signal and image processing, processing of geophysical and seismic data (which require solution of partial differential equations, and inverse problems), solution of very large scale systems, and recognition of words from a large vocabulary as required in speech recognition. A major challenge in programming these supercomputers lies in partitioning the tasks of an algorithm in a way that leads to better processor utilization (or reduced idle time). Often, the algorithms or programs may possess more concurrency, which may not be obvious at first examination. This hidden concurrency can be exploited by unfolding the program. Retiming and index mapping transformations are other approaches to unraveling the hidden concurrency. If the available concurrency in the algorithm is inadequate for the real-time speed constraints of the target application, then look-ahead computation can be used to create additional concurrency (for a class of recursive Manuscript received August 1, 1988; revised June 5, 1989. This research was supported in part by grants from the Advanced Research Project Agency monitored by the Naval Electronics Systems Command under Contract N00039-86-R-0365, the National Science Foundation under Contracts DCl-85-17339 and MIPS 89-08586, an IBM Graduate Fellowship, a University of California Regents Fellowship, and by the Electrical Engineering Department of the University of Minnesota. Portions of the research presented here were carried out when the author was with the University of California at Berkeley.