AVEC 2012

2nd International Audio/Visual Emotion Challenge and Workshop

    facing continuous emotion representation

In conjunction with ICMI 2012, October 22-26, Santa Monica, California, USA (downloadable pdf version of the call for participation).


02/11/2012 – Results of the challenge are now available, see below.

22/10/2012 – KEYNOTE CANCELLED! – The keynote by Javier Movellan is unfortunately cancelled due to an accident of the speaker.

1/10/2012 – Javier Movellan of the Machine Perception Lab, UC San Diego has been confirmed as our keynote speaker. Details of his talk can be viewed here

1/10/2012 – Programme now available: the full programme can now be viewed here.

13/07/2012 – Deadline extended: the submission deadline has been extended by 10 days to the 31 July 2012. All other important dates have changed accordingly.

13/07/2012 – The date of the workshop is now determined to be 22 October 2012, according to the ICMI web pages.

Challenge results

The AVEC2012 organisers would like to thank all the participants, and would like to congratulate the winners of the fully continuous and word-level sub-challenges. Here are the top-three rankings for the two sub-challenges:

Fully continuous sub-challenge

  1. Jeremie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani, ‘Robust continuous prediction of human emotions using multiscale dynamic cues’
  2. Catherine Soladie, Hanan Salam, Catherine Pelachaud, Nicolas Stoiber, Renaud Seguier, ‘A Multimodal Fuzzy Inference System using a Continuous Facial Expression Representation for Emotion Detection’
  3. Arman Savran, Houwei Cao, Miraj Shah, Ani Nenkova, Ragini Verma ‘Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering’

Word-level sub-challenge

  1. Arman Savran, Houwei Cao, Miraj Shah, Ani Nenkova, Ragini Verma, ‘Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering’
  2. Derya Ozkan, Stefan Scherer, Louis-Philippe Morency, ‘Step-wise Emotion Recognition using concatenated-HMM’
  3. Laurens van der Maaten, ‘Audio-Visual Emotion Challenge 2012: A Simple Approach’
Here are the full results for the fully continuous sub-challenge:

Results for the fully continuous sub-challenge (FCSC) in terms of correlation and root mean squared error

Here are the full results for the word-level sub-challenge:

Results for the word-level sub-challenge in terms of correlation and root mean squared error

Call for papers:


Björn Schuller

TUM, Germany



Michel Valstar        

University of Nottingham, UK



Roddy Cowie        

Queen’s University Belfast, UK



Maja Pantic        

Imperial College London, UK



The Audio/Visual Emotion Challenge and Workshop (AVEC 2012) will be the second competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. The goal of the challenge is to provide a common benchmark test set for individual multimodal information processing and to bring together the audio and video emotion recognition communities, to compare the relative merits of the two approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance emotion recognition systems to be able to deal with naturalistic behavior in large volumes of un-segmented, non-prototypical and non-preselected data as this is exactly the type of data that both multimedia retrieval and human-machine/human-robot communication interfaces have to face in the real world.

We are calling for teams to participate in emotion recognition from acoustic audio analysis, linguistic audio analysis, video analysis, or any combination of these. As benchmarking database the SEMAINE database of naturalistic video and audio of human-agent interactions, along with labels for four affect dimensions will be used. Emotion will have to be recognized in terms of continuous time, continuous valued dimensional affect in four dimensions: arousal, expectation, power and valence. Two sub-challenges are organised: The first involves fully continuous affect recognition, where the level of affect has to be predicted for every moment of the recording. The second sub-challenge requires participants to predict the level of affect at word-level, that is, only when the user is speaking.

Besides participation in the Challenge we are calling for papers addressing the overall topics of this workshop, in particular works that address the differences between audio and video processing of emotive data, and the issues concerning combined audio-visual emotion recognition.

Program Committee

Elisabeth André, Universität Augsburg, Germany

Anton Batliner, Universität Erlangen-Nuremberg, Germany

Felix Burkhardt, Deutsche Telekom, Germany

Rama Chellappa, University of Maryland, USA

Fang Chen, NICTA, Australia

Mohamed Chetouani, Institut des Systèmes Intelligents et de Robotique (ISIR), Fance

Laurence Devillers, Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (LIMSI), France

Julien Epps, University of New South Wales, Australia

Anna Esposito, International Institute for Advanced Scientific Studies, Italy

Raul Fernandez, IBM, USA

Roland Göcke, Australian National University, Australia

Hatice Gunes, Queen Mary University London, UK

Julia Hirschberg, Columbia University, USA

Aleix Martinez, Ohio State University, USA

Marc Méhu, University of Geneva, Switzerland

Marcello Mortillaro, University of Geneva, Switzerland

Matti Pietikainen, University of Oulu, Finland

Ioannis Pitas, University of Thessaloniki, Greece

Peter Robinson, University of Cambridge, UK

Stefan  Steidl, Uinversität Erlangen-Nuremberg, Germany

Jianhua Tao, Chinese Academy of Sciences, China

Fernando de la Torre, Carnegie Mellon University, USA

Mohan Trivedi, University of California San Diego, USA

Matthew Turk, University of California Santa Barbara, USA

Alessandro Vinciarelli, University of Glasgow, UK

Stefanos Zafeiriou, Imperial College London, UK

Important Dates

Paper submission July 31, 2012

Notification of acceptance August 14, 2012

Camera ready paper August 18, 2012

Workshop October 22, 2012

Topics include, but are not limited to: 

Participation in the ChallengeAudio/Visual Emotion Recognition

  • Audio-based Emotion Recognition
  • Linguistics-based Emotion Recognition
  • Video-based Emotion Recognition
  • Social Signals in Emotion Recognition
  • Multi-task learning of Multiple Dimensions
  • Novel Fusion Techniques as by Prediction
  • Cross-corpus Feature Relevance
  • Agglomeration of Learning Data
  • Semi- and Unsupervised Learning
  • Synthesized Training Material
  • Context in Audio/Visual Emotion Recognition
  • Multiple Rater ambiguity


  • Multimedia Coding and Retrieval
  • Usability of Audio/Visual Emotion Recognition
  • Real-time Issues

Submission Policy

In submitting a manuscript to this workshop, the authors acknowledge that no paper substantially similar in content has been submitted to another conference or workshop.Accepted workshop papers will be included in the proceedings of ICMI 2012. Manuscripts should follow the ICMI main conference paper format: 6 pages ACM style. Authors should submit papers as a PDF file via the official ICMI system. Once you are in the conference management system, please choose AVEC for submission. AVEC 2012 reviewing is double blind. Reviewing will be by members of the program committee. Each paper will receive at least two reviews. Acceptance will be based on relevance to the workshop, novelty, and technical quality.