John DuBois Department of Linguistics University of California, Santa Barbara dubois@humanitas.ucsb.edu website |
Robert Englebretson Department of Linguistics Rice University reng@rice.edu website |
Participants: | 30 |
Type of Study: | conversations |
Location: | California |
Media type: | audio |
DOI: | doi:10.21415/T5VG6X |
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The Santa Barbara Corpus of Spoken American English is based on hundreds of recordings of natural speech from all over the United States, representing a wide variety of people of different regional origins, ages, occupations, and ethnic and social backgrounds. It reflects many ways that people use language in their lives: conversation, gossip, arguments, on-the-job talk, card games, city council meetings, sales pitches, classroom lectures, political speeches, bedtime stories, sermons, weddings, and more. The corpus was collected by the University of California, Santa Barbara Center for the Study of Discourse, Director John W. Du Bois (UCSB), Associate Editors: Wallace L. Chafe (UCSB), Charles Meyer (UMass, Boston), and Sandra A. Thompson (UCSB). Additional information can be found at this site .
Each speech file is accompanied by a transcript in which phrases are time stamped with respect to the audio recording. Personal names, place names, phone numbers, etc., in the transcripts have been altered to preserve the anonymity of the speakers and their acquaintances and the audio files have been filtered to make these portions of the recordings unrecognizable. Pitch information is still recoverable from these filtered portions of the recordings, but the amplitude levels in these regions have been reduced relative to the original signal. A separate filter list file (*.flt) associated with each transcript/waveform file pair is provided to list the beginning and ending times of the filtered regions. There are 4 .flt files which are empty because there was no information that needed to be filtered out from the audio files.The filtering was done using a digital FIR low-pass filter, with the cut-off frequency set at 400 Hz. The effect of the filter was gradually faded in and out at the beginning and end of the regions over a 1,000 sample region, roughly 45 milliseconds, to avoid abrupt transitions in the resulting waveform. The audio data consists of 16 wave format speech files, recorded in two-channel pcm, at 22050Hz.
The TalkBank version of the corpus was constructed by Nii Martey of the Linguistic Data Consortium with help from Jack DuBois for Part 1 and from Robert Englebretson, now at Rice University, for Parts 2, 3, and 4. Personal names, place names, phone numbers, etc, in the transcripts have been altered to preserve the anonymity of the speakers and their acquaintances and the audio files have been filtered to make these portions of the recordings unrecognizable. Pitch information is still recoverable from these filtered portions of the recordings, but the amplitude levels in these regions have been reduced relative to the original signal. A separate filter list file (*.flt) associated with each transcript/waveform file pair is provided to list the beginning and ending times of the filtered regions. The filtering was done using a digital FIR low-pass filter, with the cut-off frequency set at 400 Hz. The effect of the filter was gradually faded in and out at the beginning and end of the regions over a 1,000 sample region, roughly 45 milliseconds, to avoid abrupt transitions in the resulting waveform. In the case of a phone number, which was not adequately disguised by the filter, the signal was set to zero, except for the 45 millisecond boundary regions which fade into and out of zero.
No. | Name | Sex | Age | City | State | Orig | Edu | Years of Edu | Occ | Race/Eth |
---|---|---|---|---|---|---|---|---|---|---|
0001 | LENORE | f | 30 | Los Angeles | CA | CA | BA | 16 | student | white |
0002 | DORIS | f | 50 | Montana | MT | MT | HS | 12 | horse ranc | white |
0003 | LYNNE | f | 19 | Montana | MT | HS | 12 | student/ho | white | |
0004 | HAROLD | |||||||||
0005 | JAMIE | f | 30 | Walnut Cre | CA | CA | college | 16 | dancer/da | white |
0006 | MILES | m | CA | black | ||||||
0007 | PETE | m | 36 | San Leandr | CA | CA | 18 | grad student | white | |
0008 | ROY | m | 34 | CA | designer | white | ||||
0009 | MARILYN | f | 33 | CA | writer | white | ||||
0010 | CAROLYN | f | 19 | Santa Fe | NM | CO | HS | 12 | student | white |
0011 | KATHY | f | 31 | Boston/Santa Fe | A/NM | CA | grad student | white | ||
0012 | SHARON | f | 24 | New Mexico | NM | TX | college | teacher | white | |
0013 | SHANE | m | 23 | Corp Christi | TX | TX | grad | med student | chicano | |
0014 | PAM | f | 43 | Massachusetts | MA | NM | housewife | white | ||
0015 | WARREN | m | 34 | Wenham | MA | IL | DVM | 23 | veterinarian | white |
0016 | DARRYL | m | 33 | San Francisco | CA | CA | BA | 16 | comm./comp | white |
0017 | PAMELA | f | 38 | Southern California | CA | CA | BA | 16 | actress/fi | white |
0018 | ALINA | f | 34 | Los Angeles | CA | CA | BA | 16 | housewife | white |
0019 | ALICE | f | 28 | Pryor | MT | MT | 4 years | 16 | student | Crow Indian |
0020 | MARY | f | 27 | Pryor | MT | MT | college | 3 | cook fire | Crow Indian |
0021 | RICKIE | San Francisco | CA | CA | HS | 12 | clerk | black | ||
0022 | JUNE | f | 21 | Laguna Beach | CA | CA | A MA | 17 | grad student | white |
0023 | REBECCA | f | 31 | Saratoga | A | CA | A J | 22 | attorney | white |
0024 | ARNOLD | m | Saginaw | MI | CA | HS | 12 | S Army | white | |
0025 | KATHY | f | 17 | Mobile | AL | AL | HS | 10 | student | white |
0026 | NATHAN | m | 19 | Mobile | AL | AL | HS | 12 | student | white |
0027 | BRAD | m | 45 | MA | 18 | director o | white | |||
0028 | PHIL | m | 30 | NM | BA | 16 | designer | hispanic | ||
0029 | DORIS | f | 83 | Indianapolis | IN | AZ | MA | 18 | teacher | white |
0030 | ANGELA | f | 90 | middle Wes | MO | AZ | MS | 18 | teacher J | white |
0031 | SAM | f | 72 | Arcadia | IN | AZ | Nursing | 15 | retired | white |
0032 | BEV | f | 20 | So California | CA | CA | HS | 15 | student | white |
0033 | MONTOYO | m | 51 | CA | PhD | political | latino/chicano | |||
0034 | MARIA | f | 26 | Nicaragua | CA | HS | 15 | dispatcher | hispanic | |
0035 | GILBERT | m | 22 | So California | CA | CA | HS | student | hispanic | |
0036 | CAROLYN | f | 18 | So California | CA | CA | HS | 12 | student | white |
0037 | LAURA | f | 23 | San Jose | CA | CA | HS | student | japanese/ | |
0038 | FRANK | m | 24 | So California | CA | CA | BA | 16 | business o | white |
0039 | RAMON | m | 19 | MoreValley | CA | CA | HS | 12 | student | hispanic |
0040 | RUBEN | m | 27 | So California | CA | CA | 5 yrs | 17 | teacher | hispanic |
0042 | KENDRA | f | 25 | midwest | IN | IN | BA | 16 | administrator | white |
0043 | KEN | m | 51 | midwest | IN | IN | Phd M | 23 | director o | white |
0044 | MARCI | f | 50 | midwest | IN | IN | MA | 19 | counselor | white |
0045 | WENDY | f | 26 | midwest | IN | IN | BS | 16 | missionary | white |
0046 | KEVIN | m | 26 | midwest | IN | IN | S Cr | 16 | missionary | white |
0047 | JIM | m | 41 | metro St.L. | IL | IL | certified | 16 | banking | white |
0048 | FRED | m | 47 | Chrisman | IL | IL | masters | 18 | loan officer | white |
0049 | JOE | m | 45 | Dupo | IL | IL | 17 | banking | white | |
0050 | KURT | m | 70 | Millstad | IL | IL | 12 | retired-co | white | |
0051 | VIVIAN | f | 55 | Shenandoah | A | IL | HS | 13 | banking | white |