Аннотации:
Several methods for detection changes in words semantics and appearance of new word meanings have been suggested. These methods use different techniques of estimating semantic distance between words. They are based both on neural network vector models and on simpler vector representations that use frequencies of n-grams including the studied words. This paper proposes a method for calculation the confidence interval of the semantic distance estimations obtained based on the frequency data of n-grams extracted from the large diachronic corpus. This task is complicated because the question about the law of distribution of frequency fluctuations of words and n-grams, despite a number of studies, remains open. The confidence intervals are calculated by statistic modeling using random permutations of n-gram frequencies. To test the proposed method, estimation of semantic distance between two Russian synonyms is used as an example.