Abstract:
© 2018 Institute of Physics Publishing. All rights reserved. Creation of the Google Books Ngram corpus opened up new opportunities for studying language evolution. This corpus consists of a large amount of digitized books written in 8 languages and contains information on frequencies of words, word combinations and syntactic relations of the last 500 years. In this paper, we present data on changes in the key characteristics of syntactic relations in English and Russian and propose a model which allows us to explain the observed changes. We used Google Books Ngram data (1800-2008) and performed modelling of network growth. Then, we compared the characteristics of the obtained model networks with the characteristics of the network of syntactic relations of the English and Russian languages. It was shown that selection of two parameters of the model allows us to obtain a very good correspondence between the changes in the clustering coefficient, the assortativity coefficient, and word distribution by the number of relations in the model network and in real networks of syntactic relations.