Corpora of Vietnamese texts: Lexical effects of intended audience and publication place

Giang Pham, Kathryn Kohnert, Edward Carney

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


This article has two primary aims. The first is to introduce a new Vietnamese text-based corpus. The Corpora of Vietnamese Texts (CVT; Tang, 2006a) consists of approximately 1 million words drawn from newspapers and children's literature, and is available online at The second aim is to investigate potential differences in lexical frequency and distributional characteristics in the CVT on the basis of place of publication (Vietnam or Western countries) and intended audience: adult-directed texts (newspapers) or child-directed texts (children's literature). We found clear differences between adult- and child-directed texts, particularly in the distributional frequencies of pronouns or kinship terms, which were more frequent in children's literature. Within child- and adult-directed texts, lexical characteristics did not differ on the basis of place of publication. Implications of these findings for future research are discussed.

Original languageEnglish (US)
Pages (from-to)154-163
Number of pages10
JournalBehavior Research Methods
Issue number1
StatePublished - Feb 2008

Bibliographical note

Funding Information:
Funding for this project was provided by the Graduate Research Partnership Program at the University of Minnesota and was awarded to the first author under the faculty mentorship of the second author.


Dive into the research topics of 'Corpora of Vietnamese texts: Lexical effects of intended audience and publication place'. Together they form a unique fingerprint.

Cite this