TY - JOUR
T1 - Design and Programming of a Flexible, Cost-Effective Systolic Array Cell for Digital Signal Processing
AU - Smith, Ross A.W.
AU - Dillon, Mike
AU - Sobelman, Gerald E.
PY - 1990/7
Y1 - 1990/7
N2 - A programmable systolic array cell for signal processing applications is described. The cell uses two chips: the 16-b NCR45CM16 CMOS Multiplier/Accumulator (MAC) for arithmetic, and the Systolic Array Controller (SAC) for routing data and controlling the MAC. The SAC has a 64 by 18 b static RAM which is used each cycle: once to read a control word and once to read or write a data word. The SAC has two 16-b data streams and one 6-b address stream. A 16-b bidirectional port routes data between the 71-pin SAC and the 24-pin MAC. All major cell resources can operate concurrently. The many practical details of implementing systolic array algorithms on an array of SAC/MAC cells are fully presented. A library of macros for commonly used program segments is described. Key issues are discussed such as programming the MAC, scaling operands, loading RAM, synchronizing cells, delaying data, unloading results, combining the macros into a program, and pipelining a program. Two systolic algorithms are developed: matrix multiplication on a linear array, and matrix multiplication on a two-dimensional array. With a two-dimensional array, a series of pipelined matrix-matrix multiplications uses the MAC every cycle.
AB - A programmable systolic array cell for signal processing applications is described. The cell uses two chips: the 16-b NCR45CM16 CMOS Multiplier/Accumulator (MAC) for arithmetic, and the Systolic Array Controller (SAC) for routing data and controlling the MAC. The SAC has a 64 by 18 b static RAM which is used each cycle: once to read a control word and once to read or write a data word. The SAC has two 16-b data streams and one 6-b address stream. A 16-b bidirectional port routes data between the 71-pin SAC and the 24-pin MAC. All major cell resources can operate concurrently. The many practical details of implementing systolic array algorithms on an array of SAC/MAC cells are fully presented. A library of macros for commonly used program segments is described. Key issues are discussed such as programming the MAC, scaling operands, loading RAM, synchronizing cells, delaying data, unloading results, combining the macros into a program, and pipelining a program. Two systolic algorithms are developed: matrix multiplication on a linear array, and matrix multiplication on a two-dimensional array. With a two-dimensional array, a series of pipelined matrix-matrix multiplications uses the MAC every cycle.
UR - http://www.scopus.com/inward/record.url?scp=0025465151&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0025465151&partnerID=8YFLogxK
U2 - 10.1109/29.57547
DO - 10.1109/29.57547
M3 - Article
AN - SCOPUS:0025465151
VL - 38
SP - 1198
EP - 1210
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
SN - 1053-587X
IS - 7
ER -