A survey of affect recognition methods: Audio, visual, and spontaneous expressions

Zhihong Zeng, Maja Pantic, Glenn I. Roisman, Thomas S. Huang

Research output: Contribution to journalArticlepeer-review

2221 Scopus citations


Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

Original languageEnglish (US)
Pages (from-to)39-58
Number of pages20
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Issue number1
StatePublished - 2009
Externally publishedYes

Bibliographical note

Funding Information:
The authors would like to thank Qiang Ji and the anonymous reviewers for encouragement and valuable comments. This paper is a collaborative work. Thomas Huang is the leader of this team work but prefers to be the last in the author list. Zhihong Zeng wrote the first draft, Maja Pantic significantly improved it by rewriting it and offering important advice, and Glenn Roisman provided important comments and polished the whole paper. Zhihong Zeng and Thomas S. Huang’s work in this paper was supported in part by a Beckman Postdoctoral Fellowship, US National Science Foundation Grant CCF 04-26627 and the US Government VACE Program. Maja Pantic’s research that lead to these results was funded in part by the EC FP7 Programme [FP7/2007-2013] under grant agreement no. 211486 (SEMAINE) and the European Research Council under the ERC Starting Grant agreement No. ERC-2007-StG-203143 (MAHNOB).


  • Evaluation/methodology
  • Human-centered computing
  • Introductory and Survey


Dive into the research topics of 'A survey of affect recognition methods: Audio, visual, and spontaneous expressions'. Together they form a unique fingerprint.

Cite this