ICSI Meeting Corpus

The ICSI Meeting Corpus 75 hours of face-to-face meeting recordings made within a research group. The meetings were conducted in English. The recordings are audio-only, which limits their utility for social signal processing, but they have been annotated for quite a few language properties using the same framework as the AMI Meeting Corpus, and the meetings vary in size, which makes this potentially a useful adjunct for looking at generalizability from that data. The AMI Consortium has an NXT-format version with all the transcription and annotation integrated into one database, and may negotiate a public release under Creative Commons licensing, if there is interest.

  • url: http://www.icsi.berkeley.edu/Speech/mr/; Jean Carletta has the NXT version
  • main_author: annotations various; recordings and transcripts from ICSI and distributed by the Linguistic Data Consortium.
  • license: various, some annotations currently in private ownership.
  • subjects: 61
  • recordings: 75
  • duration: just under an hour; ranges from 17 -103 minutes
  • naturality: mixed
  • media: synchronized close-talking and far-field audio
  • language: English
  • interaction: group
  • annotation: transcripts and other dialogue aspects

Categories: language-analysis; voice-analysis

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>