CABank Chinese CallHome Corpus

Participants:	140
Type of Study:	phone call
Location:	China
Media type:	audio
DOI:	doi:10.21415/T54022

Citation information

Some citation here.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This is the Chinese portion of CallHome.

Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements), and personal contacts. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call.

Although the goal of the call collection effort was to have unique speakers in all calls, a handful of repeat speakers are included in the corpus. In all, 200 calls were transcribed. Of these, 80 have been designated as training calls, 20 as development test calls, and 100 as evaluation test calls. For each of the training and development test calls, a contiguous 10-minute region was selected for transcription; for the evaluation test calls, a 5-minute region was transcribed. For the present publication, only 20 of the evaluation test calls are being released; the remaining 80 test calls are being held in reserve for future LVCSR benchmark tests.

After a successful call was completed, a human audit of each telephone call was conducted to verify that the proper language was spoken, to check the quality of the recording, and to select and describe the region to be transcribed. The description of the transcribed region provides information about channel quality, number of speakers, their gender, and other attributes.

File Sex Age Age
ma_0003 F 40 13
ma_0010 27 15
ma_0022 F 1
ma_0027 M 14
ma_0028 0 19
ma_0029 M 20
ma_0030 M 29 16
ma_0035 0 15
ma_0104 F 26 16
ma_0106 M 24 17
ma_0110 F 29 16
ma_0111 F 32 20
ma_0117 30 16
ma_0131 0 18
ma_0626 M
ma_0637 0 15
ma_0651 31 10
ma_0653 21 15
ma_0667 M
ma_0669 M 27 20
ma_0671 F 40 20
ma_0674 M 31 10
ma_0679 M 32 18
ma_0682 M 23 18
ma_0691 M 31 11
ma_0695 M 28 10
ma_0698 M 25 10
ma_0703 F 30 14
ma_0704 M
ma_0711 F 27 10
ma_0716 M 31 12
ma_0717 F 15
ma_0718 M 27 20
ma_0719 M 18
ma_0721 F 24 15
ma_0727 F 25 15
ma_0735 F 26 20
ma_0738 F 47 17
ma_0742 M 25 13
ma_0748 M 31 15
ma_0750 F 23 16
ma_0751 F 20 14
ma_0752 M 24 16
ma_0754 M 27 18
ma_0755 M 42
ma_0756 M 30 17
ma_0758 M 92
ma_0760 M 27 17
ma_0761 F 29 20
ma_0763 M 25 18
ma_0764 M 20
ma_0766 M 20
ma_0768 M 26 19
ma_0769 F 36 15
ma_0771 F 30 18
ma_0773 M 28 20
ma_0774 M 26 14
ma_0779 F 25 15
ma_0782 F 28 22
ma_0783 M
ma_0785 F 27 20
ma_0786 M 29 16
ma_0790 M 25 16
ma_0796 30 15
ma_0799 M 32 16
ma_0806 M 40 18
ma_0807 M 49 22
ma_0814 F 26 18
ma_0815
ma_0817 M 30 20
ma_0821 M 25 16
ma_0823 M 31 8
ma_0827 F 30 16
ma_0828 F 30 20
ma_0829 41 19
ma_0840 M 29 15
ma_0844 F 24 10
ma_0845 M
ma_0846 F 24 15
ma_0848 M 26 10
ma_0851 M 35 20
ma_0859 M 26 18
ma_0860 M 20
ma_0861 M 27 17
ma_0871 35 16
ma_0876 M
ma_0880 M 13
ma_0881 F 30 25
ma_0882 M
ma_0888 M 16
ma_0894 M 36 25
ma_0900 F 26 18
ma_0906
ma_0913 M
ma_0915
ma_0916 M 32 18
ma_0920 M
ma_0925 M 16
ma_0932
ma_0952 F 23 12
ma_0958 M 23 17
ma_0963 F 33 15
ma_0975 F 34 16
ma_0976 M 23 16
ma_0977 M
ma_1006 F 31 20
ma_1008 F
ma_1014 M 38 20
ma_1022 25 12
ma_1067 M 24 6
ma_1077 M 30 20
ma_1279 F
ma_1280 M
ma_1281 M
ma_1283 M 16
ma_1293 F 26 14
ma_1303 F 12
ma_1307 F 15
ma_1346 M 16
ma_1352 M
ma_1357 F
ma_1359 F
ma_1376 F
ma_1393 F
ma_1396 M
ma_1430 M
ma_1525 F
ma_1539 M
ma_1582 M 26 19
ma_1597 M 15
ma_1603 F
ma_1671 F
ma_1700 M
ma_1711 F
ma_1728 F
ma_1737 M