Аннотации:
The article proposes a solution to the problem of automatic recognition of Russian noun and adjective cases in the Google Books Ngram corpus. The recognition was performed by using information on word co-occurrence statistics extracted from the corpus. Explicit Word Vectors composed of frequencies of ordinary and syntactic bigrams that include a given word were fed to the input of the recognizer. Comparative testing of several types of vector representation and preliminary data normalization were carried out. The trained model was a multi-layer perceptron with a softmax output layer. To train and test the model, we selected 50000 adjectives and 50000 nouns that were most frequently used in the Google Books Ngram Russian subcorpus between 1920 and 2009. Parts of speech and cases were determined using the OpenCorpora electronic morphological dictionary. The recognition accuracy of the cases obtained using the trained neural network model was 96.45% for the nouns and 99.63% for the adjectives.