This paper presents a systematic folding transformation technique to fold any arbitrary signal processing algorithm data-flow graph to a hardware data-flow architecture, for a specified folding set and specified technology constraints. The folding set specifies the processor in which and the time partition at which the task is executed. The folding set is typically obtained by performing scheduling and resource allocation for the algorithm data-flow graph and the specified iteration period. The technology constraints imposed on the hardware architecture (i.e., the level of pipelining and the implementation style of each processor) are also assumed to be known. The folding technique is used to derive the control circuitry of the hardware architecture (including registers, switches, and interconnections). We derive conditions for the validity of a specified folding set, and present approaches to generate the dedicated architecture using systematic folding of tasks to operators. We propose automatic retiming and pipelining of algorithms described by data-flow graphs for folding. The folding algorithm is applied after preprocessing the data-flow graph (DFG) for automated pipelining and retiming. Our folding algorithm can accommodate single or multiple implementation styles and single or multiple computation clocks, and applies to folding of regular and irregular data-flow graphs.