CABank Hakka Taiwan Corpus

Huei-ling Lai
Linguistics
Chengchi University
hllai@nccu.edu.tw
website

Kawai Chui
Linguistics
Chengchi University
kawai@nccu.edu.tw
website

Participants:	29
Type of Study:	naturalistic
Location:	Taiwan
Media type:	audio
DOI:	doi:10.21415/T55960

Citation information

Chui, Kawai, and Huei-ling Lai. 2008. The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min. Taiwan Journal of Linguistics 6.2:119-144. available here.

Chui, Kawai, Huei-ling Lai, and Hui-Chen Chan. 2017. The Taiwan Spoken Chinese Corpus. In Encyclopedia of Chinese Language and Linguistic, ed. by Rint Sybesma, pp. 257-259. Boston, USA: Brill.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

In Taiwan, most people speak Mandarin, Hakka, or Southern Min. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Hakka and Southern Min is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for teaching and research.

As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It started with three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min, but currently, it contains Mandarin and Hakka data only. The corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.