Word length in tatar: Selecting relevant parameters for modeling

Galieva A.

Word length in tatar: Selecting relevant parameters for modeling

Galieva A.

URI: https://dspace.kpfu.ru/xmlui/handle/net/161139

Дата: 2020

Аннотации:

Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). This paper studies word length in the Tatar language examining data of fiction texts (the sample includes examples of both prose and poetry). Word length is a stochastic phenomenon depending on a great number of factors, including language type, text organization, its addressee, etc.; however, there are internal linguistic laws governing parameters of word length and frequencies of words, and the issue comprises universal and language specific features. We found that ration of words of different length are dissimilar in individual texts, and the most common words are those composed of 5 phonemes and 2 syllables. We evaluated word length in Tatar texts and attempted to fit a model based on Poisson distribution (in particular, a model based on one-displaced Poisson-uniform distribution was used), so description of empirical data was complemented with fitting theoretical values for word frequencies. Besides, Shannon's entropy of word lengths was evaluated, and a weak correlation between the average word length and entropy was found.

Показать полную информацию

Файлы в этом документе

Имя: SCOPUS16130073-20 ...

Размер: 47.36Kb

Формат: PDF

Открыть

Данный элемент включен в следующие коллекции

Публикации сотрудников КФУ Scopus [24551]
Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.