CABank German CallHome Corpus


Participants: 100
Type of Study: phone call
Location: United States
Media type: audio
DOI: doi:10.21415/T56P4B

Browsable transcripts

Download transcripts

Media folder

Citation information

Some citation here.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This is the German portion of CallHome.

Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements), and personal contacts. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call.

Although the goal of the call collection effort was to have unique speakers in all calls, a handful of repeat speakers are included in the corpus. In all, 200 calls were transcribed. Of these, 80 have been designated as training calls, 20 as development test calls, and 100 as evaluation test calls. For each of the training and development test calls, a contiguous 10-minute region was selected for transcription; for the evaluation test calls, a 5-minute region was transcribed. For the present publication, only 20 of the evaluation test calls are being released; the remaining 80 test calls are being held in reserve for future LVCSR benchmark tests.

After a successful call was completed, a human audit of each telephone call was conducted to verify that the proper language was spoken, to check the quality of the recording, and to select and describe the region to be transcribed. The description of the transcribed region provides information about channel quality, number of speakers, their gender, and other attributes.
FileSexAgeAgePlace
4002F---
4024M3124Pfullingen
4028M2518Berlin
4073M2618Kassel
4076F3717Krefeld
4111M4023Bensberg
4123M2719Hagen
4287M3224Buende
4308F3712Hadmersleben
4384F3214Mainz
4458M3020Bad V slac
4552M3121Freiburg
4553M2720Gengenbach
4630F4916Voerde
4684M2418Bads-Alzunge
4711F5122Nuremburg
4755F5712Bremen
4764F3221Hamburg
4765F4616Bielefeld
4777F5612Berlin
4828M2516Cologne
4857M3420Augsburg
4866M2517Beckum
4868M5420Stutgart
4896F2313Leverkusen
4921M2615Hildesheim
4940F2716Hamburg
4951M2315Stuttgart
4957M2316UNK
4965F2913Bad-Neuheim
5016M3421Guendburg
5088F2614Ommernheim
5097F1610Munich
5123M1913Berlin
5143F2916Frankfurt
5159F7616Bernburg
5161F2716Goetting
5168M2416Kahl
5206F7413Breslau
5207F5212Neunkirchen
5223F6812Stuttgart
5224F6214Berlin
5248F6016Berlin
5298F4016Frankfurt
5351F6018Berlin
5421F6916Munich
5452F6712Leipzig
5493F4713Heidelberg
5518M6612Germany
5519M2415Friederchshsen
5566F5417Frankfurt
5569F2717Salzburg
5577F2618Munich
5596F4716Zweibruecken
5626F6512Hanover
5661M2519Berlin
5681M5912Berlin
5699F6816Berlin
5776F6920Cologne
5778F2112South_A VA
5832F6812Liga
5900F2316Osnabrueck
5909F6916Stuttgart
5944F4020Switzerland
5945F2820Frankenberg
6069F2921Cleveland
6140F3120Insbrook
6144M6016Berlin
6162M2418Waldshut
6197M3122Mainz
6199F3120Frankfurt
6219F1913Wiesental
6247M5414Buchel
6248F4118Giessen
6250M6014Chemnitz
6251F6012Palastinate
6297M2619Bibirbach
6311F6918Saaz
6312M2517Berlin
6333F5412Hamburg
6349M5416Braunschweig
6350M3710Allersberg
6352F2416Coesfeld
6373M1712Munich
6386M2517Tuebingen
6388M2418Berlin
6446F3816Gottingen
6477F5616Berlin
6506M7515Karlsruhe
6517M2818Oldenburg
6518M2612Dessau
6545F4118Hamm
6623M5716Stuttgart
6639M2617Cologne
6659M4917Heidenheim
6691M3123Munich
6692F6514Stuttgart
6719F2517Berlin
6838M2618Hannover
6888M2315Stuttgart

Acknowledgements

Andrew Yankes reformatted this corpus into accord with current versions of CHAT.