TY - GEN
T1 - Parallel modular multiplication with application to VLSI RSA implementation
AU - Freking, William L.
AU - Parhi, Keshab K.
PY - 1999
Y1 - 1999
N2 - In this paper, modular multiplication, the fundamental operation composing modular exponentiation, is internally parallelized for the first time at the digit level. Modular exponentiation is the core computation of numerous public-key cryptography (PKC) systems including RSA. As a performance criterion, overall latency is often more significant than throughput in the principal PKC applications of key exchange and authentication. Efforts to address total latency architecturally through traditional modular multiplication techniques utilizing pipelining are hindered by the inherent recursive nature of practical modular exponentiation methods. Thus, performance scalability relative to implementation area has been limited. Fine-grain parallelization methods revealed in this paper are compelling because they permit overall latency reduction in addition to increased throughput. First, a hybrid bi-directional method is introduced for two-parallel implementations. Second, a uni-directional p-parallel technique is introduced which attains general levels of parallelism, thereby enabling performance scalability. These new techniques create a foundation for ultra-high-performance implementations.
AB - In this paper, modular multiplication, the fundamental operation composing modular exponentiation, is internally parallelized for the first time at the digit level. Modular exponentiation is the core computation of numerous public-key cryptography (PKC) systems including RSA. As a performance criterion, overall latency is often more significant than throughput in the principal PKC applications of key exchange and authentication. Efforts to address total latency architecturally through traditional modular multiplication techniques utilizing pipelining are hindered by the inherent recursive nature of practical modular exponentiation methods. Thus, performance scalability relative to implementation area has been limited. Fine-grain parallelization methods revealed in this paper are compelling because they permit overall latency reduction in addition to increased throughput. First, a hybrid bi-directional method is introduced for two-parallel implementations. Second, a uni-directional p-parallel technique is introduced which attains general levels of parallelism, thereby enabling performance scalability. These new techniques create a foundation for ultra-high-performance implementations.
UR - http://www.scopus.com/inward/record.url?scp=0032716008&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0032716008&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0032716008
SN - 0780354729
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
SP - I-490 - I-495
BT - Proceedings - IEEE International Symposium on Circuits and Systems
PB - IEEE
T2 - Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, ISCAS '99
Y2 - 30 May 1999 through 2 June 1999
ER -