Abstract
Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication - the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of the crossbar cells, and the in-situ computation assumes all cells on each crossbar column are of the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights (PRIME), or add an offset to weights so that all values become positive (ISAAC). Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better address this problem, we propose FORMS, a fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computation - ensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such polarized weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization during the DNN training, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we divide the crossbar into logical sub-arrays and only enforce this property within the fine-grained sub-array columns. Crucially, the small sub-arrays provides a unique opportunity for input zero-skipping, which can significantly avoid unnecessary computations and reduce computation time. At the same time, it also makes the hardware much easier to implement and is less susceptible to non-idealities and noise than coarse-grained architectures. Putting all together, with the same optimized DNN models, FORMS achieves 1.50× and 1.93× throughput improvement in terms of $\frac{{GOPs}}{{s \times m{m^2}}}$ and $\frac{{GOPs}}{W}$ compared to ISAAC, and 1.12× ~2.4 × speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost. Interestingly, FORMS optimization framework can even speed up the original ISAAC from 10.7 × up to 377.9×, reflecting the importance of software/hardware co-design optimizations.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, ISCA 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 265-278 |
Number of pages | 14 |
ISBN (Electronic) | 9781665433334 |
DOIs | |
State | Published - Jun 2021 |
Externally published | Yes |
Event | 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021 - Virtual, Online, Spain Duration: Jun 14 2021 → Jun 19 2021 |
Publication series
Name | Proceedings - International Symposium on Computer Architecture |
---|---|
Volume | 2021-June |
ISSN (Print) | 1063-6897 |
Conference
Conference | 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021 |
---|---|
Country/Territory | Spain |
City | Virtual, Online |
Period | 6/14/21 → 6/19/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.