CABank Spanish CallHome Corpus

Participants:	120
Type of Study:	phone call
Location:	United States
Media type:	audio
DOI:	doi:10.21415/T51K54

Citation information

Some citation here.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This is the Spanish portion of CallHome.

Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements), and personal contacts. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call.

Although the goal of the call collection effort was to have unique speakers in all calls, a handful of repeat speakers are included in the corpus. In all, 200 calls were transcribed. Of these, 80 have been designated as training calls, 20 as development test calls, and 100 as evaluation test calls. For each of the training and development test calls, a contiguous 10-minute region was selected for transcription; for the evaluation test calls, a 5-minute region was transcribed. For the present publication, only 20 of the evaluation test calls are being released; the remaining 80 test calls are being held in reserve for future LVCSR benchmark tests.

After a successful call was completed, a human audit of each telephone call was conducted to verify that the proper language was spoken, to check the quality of the recording, and to select and describe the region to be transcribed. The description of the transcribed region provides information about channel quality, number of speakers, their gender, and other attributes.

File Sex Age Age
sp_0053 30 16
sp_0054 56 22
sp_0082 39 14
sp_0084 32 12
sp_0088 37 15
sp_0616
sp_0681 15
sp_0687
sp_0699 29 17
sp_0707
sp_0737
sp_0776
sp_0857 20 17
sp_0912 56 22
sp_0934
sp_0937 25 19
sp_0943
sp_0970
sp_1015 30 10
sp_1031
sp_1046
sp_1059
sp_1074 22
sp_1084 32 20
sp_1100
sp_1142 29 19
sp_1143 34 20
sp_1148 34 16
sp_1156 32 10
sp_1157 21
sp_1163 35 19
sp_1186 19
sp_1212 22 16
sp_1219 37 16
sp_1295 27 15
sp_1343 31 20
sp_1345 29 14
sp_1362
sp_1427
sp_1435 25 16
sp_1438 21 16
sp_1553
sp_1577
sp_1578
sp_1587
sp_1592
sp_1594
sp_1596 30 21
sp_1643 40 16
sp_1644
sp_1648
sp_1651
sp_1654 39 16
sp_1673
sp_1720 28 20
sp_1747 18
sp_1748
sp_1784 20 2
sp_1785
sp_1789
sp_1807 26 17
sp_1813 19 13
sp_1814 19 13
sp_1827
sp_1829 20 15
sp_1847 37 12
sp_1850 74 12
sp_1858 20 12
sp_1904
sp_1923 23 18
sp_1926 14
sp_1931
sp_1933 58 14
sp_1934
sp_1940
sp_1953 37 16
sp_1954 20 15
sp_1955 26 17
sp_1963 19 13
sp_2003 28 14
sp_2010
sp_2023 28 14
sp_2024 25 20
sp_2036 21 14
sp_2046 20 14
sp_2049
sp_2061
sp_2067
sp_2069 20 14
sp_2077
sp_2078
sp_2079 24 18
sp_2082 20 14
sp_2083 19 13
sp_2086
sp_2114 28 18
sp_2155 48 14
sp_2158
sp_2164
sp_2168
sp_2173 40 10
sp_2174
sp_2175
sp_2179 20 14