Hardware efficient fast DCT based on novel cyclic convolution structures

Chao Cheng, Keshab K. Parhi

Research output: Contribution to journalArticlepeer-review

30 Scopus citations


Cyclic convolution is a widely used operation in signal processing. In very large-scale integration (VLSI) design, it is usually implemented with systolic array and distributed arithmetic; however, these implementation designs may not be fast enough or use too much hardware cost when the convolution length is large. This paper presents a new fast cyclic convolution algorithm, which is hardware efficient and suitable for high-speed VLSI implementation, especially when the convolution length is large. For example, when the proposed fast cyclic convolution algorithm is applied to the implementation of prime length discrete cosine transform (DCT), the proposed high-throughput implementation of 1297-length DCT design saves 1216 (94%) multiplications, 282 (22%) additions, and 4792 (74%) delay elements compared with those of recently proposed systolic array based algorithms. Furthermore, the proposed algorithm can run at a speed that is 1.5 times that of previous designs and requires less I/O cost as long as the wordlength L is less than 20 bits.

Original languageEnglish (US)
Pages (from-to)4419-4434
Number of pages16
JournalIEEE Transactions on Signal Processing
Issue number11
StatePublished - Nov 2006

Bibliographical note

Funding Information:
Manuscript received February 15, 2005; revised December 28, 2005. This work was supported in part by the National Science Foundation under Grant CCF-0429979. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Shuvra S. Bhattacharyya.


  • Cyclic convolution
  • Discrete cosine transforms
  • Linear convolution
  • Very large-scale integration


Dive into the research topics of 'Hardware efficient fast DCT based on novel cyclic convolution structures'. Together they form a unique fingerprint.

Cite this