Hardware efficient fast DCT based on novel cyclic convolution structures

Chao Cheng, Keshab K. Parhi

Research output: Contribution to journalArticle

27 Scopus citations


Cyclic convolution is a widely used operation in signal processing. In very large-scale integration (VLSI) design, it is usually implemented with systolic array and distributed arithmetic; however, these implementation designs may not be fast enough or use too much hardware cost when the convolution length is large. This paper presents a new fast cyclic convolution algorithm, which is hardware efficient and suitable for high-speed VLSI implementation, especially when the convolution length is large. For example, when the proposed fast cyclic convolution algorithm is applied to the implementation of prime length discrete cosine transform (DCT), the proposed high-throughput implementation of 1297-length DCT design saves 1216 (94%) multiplications, 282 (22%) additions, and 4792 (74%) delay elements compared with those of recently proposed systolic array based algorithms. Furthermore, the proposed algorithm can run at a speed that is 1.5 times that of previous designs and requires less I/O cost as long as the wordlength L is less than 20 bits.

Original languageEnglish (US)
Pages (from-to)4419-4434
Number of pages16
JournalIEEE Transactions on Signal Processing
Issue number11
StatePublished - Nov 1 2006



  • Cyclic convolution
  • Discrete cosine transforms
  • Linear convolution
  • Very large-scale integration

Cite this