CABank English CABNC Corpus
|
Saul Albert
Social Sciences
Loughborough University
s.b.albert@lboro.ac.uk
website
|
|
Laura de Ruiter
Department of Psychology
Tufts University
laura.deruiter@tufts.edu
website
|
|
J.P. de Ruiter
Department of Computer Science and Department of Psychology
Tufts University
jp.deruiter@tufts.edu
website
|
Participants: | ~400 |
Type of Study: | Subcorpus converted to CHAT for TalkBank |
Location: | UK |
Media type: | audio |
DOI: | doi:10.21415/T55Q5R |
Browsable transcripts
Download transcripts
Media folder
Citation information
Saul Albert, Laura E. de Ruiter, and J.P. de Ruiter (2015) CABNC: the
Jeffersonian transcription of the Spoken British National Corpus.
https://saulalbert.github.io/CABNC/.
In accordance with TalkBank rules, any use of data from this corpus must
be accompanied by at least one of the above references.
Project Description
The CABNC corpus is a open-licensed, detailed conversation analytic
re-transcription of naturalistic conversations from a subcorpus of the British National Corpus amounting to
around 4.2 million words in 1436 separate conversations.
The project aims to produce transcripts usable for both computational and
detailed qualitative analysis. If you are a CA transcriptionist and you use the
data, please make sure you re-submit your updated transcripts to help improve
the corpus over time.
The project website with instructions for contributing is at https://github.com/saulalbert
/CABNC
Acknowedgements
- All files are publicly available under a Creative Commons
Attribution License (details here)
- BNC spoken audio recordings were created or collected from other
sources by Longman Dictionaries for the British National Corpus
Consortium. Their usage is governed by the terms of the original recording permissions agreement
with the contributors, which requires that they can only be "used for
scientific study and publication by writers of dictionaries and
educational material and language researchers".
- For use of the AudioBNC and the CABNC please cite: John Coleman, Ladan
Baghai-Ravary, John Pybus, and Sergio Grau (2012) Audio BNC: the audio
edition of the Spoken British National Corpus. Phonetics Laboratory,
University of Oxford. http://www.phon.ox.ac.uk/AudioBNC.
- Many thanks to
Dr. Margaret E. L. Renwick for her forced alignment data, which we
used to enrich the BNC-XML with word and turn-timing data.