Recognition of Named Entities in the Russian Subcorpus Google Books Ngram

Khristoforov S.V.; Shevlyakova A.V.; Bochkarev V.V.

Recognition of Named Entities in the Russian Subcorpus Google Books Ngram

Bochkarev V.V.; Khristoforov S.V.; Shevlyakova A.V.

URI: https://dspace.kpfu.ru/xmlui/handle/net/161085

Дата: 2020

Аннотации:

© 2020, Springer Nature Switzerland AG. This paper describes how to build a recognizer to identify named entities that occur in the Google Books Ngram corpus. In the previous studies, the text was usually input to the recognizer to solve the task of named entities recognition. In this paper, the decision is made based on the analysis of the word co-occurrence statistics. The recognizer is a neural network. A vector of frequencies of bigrams or syntactic bigrams including the studied word is fed at the input. The task is to recognize named entities denoted by one word. However, the proposed method can be further applied to recognize two- or multi-word named entities. The recognition error probability obtained on the test sample of 10 thousand words, which are free from homonymy, was 2.71% (F1-score is 0.963). Solving the problem of word classification in Google Books Ngram will allow one to create large dictionaries of named entities that will improve recognition quality of named entities in texts by existing algorithms.

Показать полную информацию

Файлы в этом документе

Имя: SCOPUS03029743-20 ...

Размер: 81.13Kb

Формат: PDF

Открыть

Данный элемент включен в следующие коллекции

Публикации сотрудников КФУ Scopus [24551]
Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.