CABank Istriot Corpus

Eliana Moscarda Mirkovic
Humanities and Social Sciences
University of Pula

Nada Poropat Jeletic
Humanities and Social Sciences
University of Pula

Gordana Hržica
Speech and Language Pathology
University of Zagreb

Participants: 13
Type of Study: demographic
Location: Croatia
Media type: audio
DOI: doi:10.21415/NR0X-ZZ76

Browsable transcripts

Download transcripts

Media folder

Citation information

Moscarda Mirković, E., Moscarda, L., Sulle orme della tradizione culinaria gallesanese. Aspetti culturali e storico-linguistici. Galižana. Unione Italiana; Comunità degli Italiani di Gallesano, 2015

Moscarda Mirković, Eliana ; Poropat Jeletić, Nada Dialetti in contatto nella Regione Istriana. Metodi d’indagine per un Archivio della memoria linguistica e culturale dell’Istria // Studia Romanica et Anglica Zagrabiensia, 65. Zagreb. 2020, 437-444.

Moscarda Mirković, E., La tradizione paremiologica a Gallesano (parte I) // Atti del Centro di Ricerche Storiche – Rovigno, XXXI, ed. Budicin, M. Opicina (Trieste). 2002, 371-468.

Moscarda Mirković, E., La tradizione paremiologica a Gallesano (parte II) // Atti del Centro di Ricerche Storiche – Rovigno, XXXII, ed. Budicin M. Trieste. 2003, 515-626.

Moscarda Mirković, E., La tradizione paremiologica a Gallesano (parte III), // Atti del Centro di Ricerche Storiche – Rovigno, XXXIII, ed. Budicin M. Trieste. 2004, 701-766.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Corpus Description

The corpus consists of transcripts of conversations of 13 adult bilingual participants, speakers of Istriot, recorded in Istrian peninsula. The corpus is accompanied by a MS Excel spreadsheet containing demographic and sociolinguistic data about each speaker. In each row data for one speaker in each transcript is presented and the data concerns: gender, year of birth, place of birth, education, profession, number of family members (nuclear family), average income, information about the onset of language exposure for each language variety, information about the amount of language exposure for each language variety. The information facilitates statistical analyses of the database, the selection of subsets of certain transcripts or speaker for more detailed and specific research purposes and it is very useful for determining the speakers bilingual/multilingual status.

Sociolinguistic situation

The Istrian peninsula is the western-most region of Croatia, encompassed by the northern Adriatic Sea. Extending from the bays of Triest and Venice (in the North-West) to the bay of Rijeka and Kvarner (in the North-East), it further reaches the Cape Premanture (in the South). The part of the peninsula belonging to Croatia partially coincides with the Istria County, the only statutory bilingual County in Croatia where the Croatian-Italian bilingualism is recognized de jure and de facto (therefore its official name is bilingual: Istarska županija/Regione istriana). The whole area is characterized by a permanent contact with Croatophone and Italophone cultures and language varieties that date back several centuries and are still used in everyday spoken communication giving rise to a complex and fragmented sociolinguistic macro-system shaped by the mutual interplay of asymmetric and diglossic/polyglossic relations among two official languages (Croatian and Italian), complemented by macro-regional dialects (the Istrovenetian koine and the Chakavian koine), micro-regional dialects (Chakavian, Kaikavian, Shtokavian), with the addition of local dialects in Istria (like the Istriot dialects, the Istroromanian dialects, etc.) (Blagoni 2007).

The Istriot language (ISO 639-3 code: IST), the archaic and autochthonous pre-Venetian Romance language that developed on the substrate of the “regional” vulgar Latin in the Southern parts of Istria, are preserved as linguistic islands in four Istrian centers: Rovinj-Rovigno, Bale-Valle, Galižana-Gallesano, Šišan-Sissano (and Fažana-Fasana and Vodnjan-Dignano aprox. till one decade ago). Istriot is listed in the UNESCO Red Book of Endangered Languages as a language at serious risk of extinction.


Data were collected from 2019 to 2021 and language sampling was performed by investigators from local communities with access to groups of Istriot speakers, namely researchers in their own social networks. Sampling was performed in different everyday informal interactive situations, mostly during spontaneous speech situations among family members or acquaintances, such as informal gatherings. Thus, the controlling of genre and formality (conversations in informal situations) were performed. The aim of ensuring the bilingual mode was followed by recording informal spontaneous conversation and all the speakers participating to the recorded conversation were proficient bilinguals or multilinguals.

The corpus was collected in the course of the project Multilevel approach to discourse in language development (Croatian Science Foundation, UIP-2017-05-6603). All participants signed informed consent in which the data collection was described. They were informed that their data will be published as a part of a corpus but will be anonymized. They could and still can withdraw from this study and/or withdraw their transcripts from the corpus. With the aim of mitigating the Observer’s paradox (Labov, 1972), two criteria were applied. First, all the participants were informed about the research aims and speech sampling procedure. They all provided a written informed consent in which they agree to be recorded without their explicit knowledge at a random point within the period of one month after signing the consent. Second, the investigators were trained to participate in the recorded sessions as little as possible. Almost all the recording sessions lasted approximately 10 minutes.

Participants were administered a background questionnaire that elicited information on their sociodemographic, sociolinguistic and socioeconomic status, language exposure and language usage in their social networks.

A conversational sampling method was employed for building the corpus including 13 native speakers of Istriot, recruited across two generations, living in different areas of the Istrian peninsula. All the transcripts were annotated to include the participants’ basic information regarding gender, age and the location of the conversation.

Place of recording. Participants were recruited by the investigators in Rovinj/Rovigno, Bale/Valle, Galižana/Gallesano and Šišan/Sissano in order to ensure diatopic representativeness.

Acknowledgements The corpus was collected in the course of the project Multilevel approach to discourse in language development (Croatian Science Foundation, UIP-2017-05-6603). This work could not have been possible without the help of all the interlocutors who participated in the study. Furthermore, we are grateful to Nives Giuricin, Dea Lordanić and Nicol Verbanac for their help during the collection of the audio recordings.