TY - JOUR
T1 - Architectures for Serial and Parallel Pipelined NTT-Based Polynomial Modular Multiplication
AU - Chiu, Sin-Wei
AU - Parhi, Keshab K.
N1 - Publisher Copyright:
© 1993-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Quantum computers pose a significant threat to modern cryptographic systems by efficiently solving problems such as integer factorization through Shor’s algorithm. Homomorphic encryption (HE) schemes based on ring learning with errors (Ring-LWE) offer a quantum-resistant framework for secure computations on encrypted data. Many of these schemes rely on polynomial multiplication, which can be efficiently accelerated using the number theoretic transform (NTT) in leveled HE, ensuring practical performance for privacy-preserving applications. This article presents a novel NTT-based serial pipelined multiplier that achieves full-hardware utilization through interleaved folding, and overcomes the 50% under-utilization limitation of the conventional serial R2MDC architecture. In addition, it explores tradeoffs in pipelined parallel designs, including serial, 2-parallel, and 4-parallel architectures. Our designs leverage increased parallelism, efficient folding techniques, and optimizations for a selected constant modulus to achieve superior throughput (TP) compared with state-of-the-art implementations. While the serial fold design minimizes area consumption, the 4-parallel design maximizes TP. Experimental results on the Virtex-7 platform demonstrate that our architectures achieve at least 2.22 times higher TP/area for a polynomial length of 1024 and 1.84 times for a polynomial length of 4096 in the serial fold design, while the 4-parallel design achieves at least 2.78 times and 2.79 times, respectively. The efficiency gain is even more pronounced in TP squared over area, where the serial fold and 4-parallel designs outperform prior works by at least 4.98 times and 26.43 times for a polynomial length of 1024 and 6.7 times and 43.77 times for a polynomial length of 4096, respectively. These results highlight the effectiveness of our architectures in balancing performance, area efficiency, and flexibility, making them well-suited for high-speed cryptographic applications.
AB - Quantum computers pose a significant threat to modern cryptographic systems by efficiently solving problems such as integer factorization through Shor’s algorithm. Homomorphic encryption (HE) schemes based on ring learning with errors (Ring-LWE) offer a quantum-resistant framework for secure computations on encrypted data. Many of these schemes rely on polynomial multiplication, which can be efficiently accelerated using the number theoretic transform (NTT) in leveled HE, ensuring practical performance for privacy-preserving applications. This article presents a novel NTT-based serial pipelined multiplier that achieves full-hardware utilization through interleaved folding, and overcomes the 50% under-utilization limitation of the conventional serial R2MDC architecture. In addition, it explores tradeoffs in pipelined parallel designs, including serial, 2-parallel, and 4-parallel architectures. Our designs leverage increased parallelism, efficient folding techniques, and optimizations for a selected constant modulus to achieve superior throughput (TP) compared with state-of-the-art implementations. While the serial fold design minimizes area consumption, the 4-parallel design maximizes TP. Experimental results on the Virtex-7 platform demonstrate that our architectures achieve at least 2.22 times higher TP/area for a polynomial length of 1024 and 1.84 times for a polynomial length of 4096 in the serial fold design, while the 4-parallel design achieves at least 2.78 times and 2.79 times, respectively. The efficiency gain is even more pronounced in TP squared over area, where the serial fold and 4-parallel designs outperform prior works by at least 4.98 times and 26.43 times for a polynomial length of 1024 and 6.7 times and 43.77 times for a polynomial length of 4096, respectively. These results highlight the effectiveness of our architectures in balancing performance, area efficiency, and flexibility, making them well-suited for high-speed cryptographic applications.
KW - Folding
KW - homomorphic encryption (HE)
KW - interleaving
KW - number theoretic transform (NTT)
KW - parallel processing
KW - pipelining
KW - polynomial modular multiplication
UR - http://www.scopus.com/inward/record.url?scp=105007917099&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105007917099&partnerID=8YFLogxK
U2 - 10.1109/tvlsi.2025.3576782
DO - 10.1109/tvlsi.2025.3576782
M3 - Article
AN - SCOPUS:105007917099
SN - 1063-8210
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ER -