The ICSI Meeting Corpus 75 hours of face-to-face meeting recordings made within a research group. The meetings were conducted in English. The recordings are audio-only, which limits their utility for social signal processing, but they have been annotated for quite a few language properties using the same framework as the AMI Meeting Corpus, and the meetings vary in size, which makes this potentially a useful adjunct for looking at generalizability from that data. The AMI Consortium has an NXT-format version with all the transcription and annotation integrated into one database, and may negotiate a public release under Creative Commons licensing, if there is interest.
- url: http://www.icsi.berkeley.edu/Speech/mr/; Jean Carletta has the NXT version
- main_author: annotations various; recordings and transcripts from ICSI and distributed by the Linguistic Data Consortium.
- license: various, some annotations currently in private ownership.
- subjects: 61
- recordings: 75
- duration: just under an hour; ranges from 17 -103 minutes
- naturality: mixed
- media: synchronized close-talking and far-field audio
- language: English
- interaction: group
- annotation: transcripts and other dialogue aspects
Categories: language-analysis; voice-analysis

Copyright © 2012