AVEC 2011: Audio/Visual Emotion Challenge and Workshop

AVEC 2011

1st International Audio/Visual Emotion Challenge and Workshop

bridging between modalities

in conjunction with ACII 2011, October 9, Memphis, Tennessee, USA


The AVEC2011 organisers would like to thank all the participants, and would like to congratulate the winners of the audio and video sub-challenges. Here are the top-three rankings for the two sub-challenges:

Audio sub-challenge

  1. Hongying Meng and Nadia Bianchi-Berthouze, ‘Naturalistic Affective Expression Classification by a Multi-stage Approach based on Hidden Markov Models’
  2. Michael Glodek, Stephan Tschechne, Georg Layher, Martin Schels, Tobias Brosch, Stefan Scherer, Markus Kächele, Miriam Schmidt, Heiko Neumann, Günther Palm and Friedhelm Schwenker, ‘Multiple Classifier Systems for the Classification of Audio-Visual Emotional States
  3. Jonathan C. Kim, Hrishikesh Rao and Mark A. Clements, ‘Investigating the Use of Formant based Features for Detection of Affective Dimensions in Speech’

Video sub-challenge

  1. Geovany Ramirez, Tadas Baltrusaitis and Louis-Philippe Morency, ‘Modeling Latent Discriminative Dynamic of Multi-Dimensional Affective Signals’
  2. Albert Cruz, Bir Bhanu and Songfan Yang, ‘A Psychologically Inspired Match-Score Fusion Model for Video-Based Facial Expression Recognition’
  3. Michael Glodek, Stephan Tschechne, Georg Layher, Martin Schels, Tobias Brosch, Stefan Scherer, Markus Kächele, Miriam Schmidt, Heiko Neumann, Günther Palm and Friedhelm Schwenker, ‘Multiple Classifier Systems for the Classification of Audio-Visual Emotional States
There were only two audio-visual participants, and we therefore decided not to issue an award for the audio-visual sub-challenge.
Here are the full results for the audio sub-challenge:

Results for the audio sub-challenge in terms of weighted accuracy

Results for the audio sub-challenge in terms of unweighted accuracy

And these are the results for the video sub-challenge:

Results of participants in the video sub-challenge, ranked by weighted accuracy.

The unweighted accuracy results for the video sub-challenge, participants are ranked by their results in weighted accuracy.

And finally, below are the results on the audio-visual portion of the test set. Please note that not all participants in the audio or video sub-challenges contributed results on this portion of the test set, so it is not possible to draw any conclusions based on this data.

Results on the audio-visual portion of the test set, in terms of weighted accuracy.

Results on the audio-visual portion of the test set, in terms of unweighted accuracy.


Get started by registering for Challenge data and feature access.

Read the Call for Papers and the paper describing the challenge data, protocol, and baseline results.


Björn Schuller

TUM, Germany



Michel Valstar        

Imperial College London, UK



Roddy Cowie        

Queen’s University Belfast, UK



Maja Pantic        

Imperial College London, UK



The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) will be the first competition event aimed at comparison of automatic audio, visual and audiovisual emotion analysis. The goal of the challenge is to provide a common benchmark test set for individual multimodal information processing and to bring together the audio and video emotion recognition communities, to compare the relative merits of the two approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance emotion recognition systems to be able to deal with naturalistic behavior in large volumes of un-segmented, non-prototypical and non-preselected data as this is exactly the type of data that both multimedia retrieval and human-machine/human-robot communication interfaces have to face in the real world.

We are calling for teams to participate in three emotion detection sub-challenges: emotion detection from audio, from video, or from audiovisual information. As benchmarking database the SEMAINE database of naturalistic dialogues will be used. Emotion will have to be recognized in terms of positive/negative valence, and high and low arousal, expectancy and power.

Besides participation in the Challenge we are calling for papers addressing the overall topics of this workshop, in particular works that address the differences between audio and video processing of emotive data, and the issues concerning combined audio-visual emotion recognition.

Please regularly visit our website http://sspnet.eu/avec2011 for more information.

Program Committee

Anton Batliner, FAU, Germany

Felix Burkhardt, Deutsche Telekom, Germany

Rama Chellappa, University of Maryland, USA

Mohamed Chetouani, Univ. Paris 6, France

Fernando De la Torre, CMU, USA

Laurence Devillers, CNRS-LIMSI, France

Julien Epps, Univ. New South Wales, Australia

Raul Fernandez, IBM, USA

Hatice Gunes, Imperial College, UK

Julia Hirschberg, Columbia University, USA

Aleix Martinez, Ohio State University, USA

Marc Mehu, UNIGE, Switzerland

Marcello Mortillaro, UNIGE, Switzerland

Matti Pietikainen, University of Oulu, Finland

Ioannis Pitas, University of Thessaloniki, Greece

Peter Robinson, Univ. of Cambridge, UK

Stefan Steidl, ICSI, USA

Jianhua Tao, Chinese Acad. of Sciences, China

Mohan Trivedi, UCSD, USA

Matthew Turk, University of California, USA

Alessandro Vinciarelli, Univ. of Glasgow, UK

Stefanos Zafeiriou, Imperial College, UK

Important Dates Paper submissionJuly 18, 2011Notification of acceptanceJuly 26, 2011Final challenge result submissionJuly 29, 2011Camera ready paperJuly 31, 2011WorkshopOctober 9, 2011 Topics include, but are not limited to: Participation in the Challenge

  • Audio Sub-Challenge
  • Video Sub-Challenge
  • Audiovisual Sub-Challenge

Audio/Visual Emotion Recognition

  • Audio-based Emotion Recognition
  • Video-based Emotion Recognition
  • Novel Fusion Techniques
  • Prediction as Fusion
  • Cross-corpus Feature Relevance
  • Agglomeration of Data
  • Semi-supervised Learning
  • Synthesized Training Material


  • Multimedia Coding and Retrieval
  • Usability of Audio/Visual Emotion Recognition
  • Real-time Issues
Submission Policy In submitting a manuscript to this workshop, the authors acknowledge that no paper substantially similar in content has been submitted to another conference or workshop. Manuscripts should follow the
Springer LNCS paper format.Authors should submit papers as a PDF file viahttp://www.easychair.org/conferences/?conf=acii20110Papers accepted for the workshop will be allocated10 pages in the proceedings of ACII 2011.AVEC 2011 reviewing is double blind. Reviewing will be by members of the program committee. Each paper will receive at least two reviews. Acceptance will be based on relevance to the workshop, novelty, and technical quality.