Kazan Federal University Digital Repository

Recognition of Named Entities in the Russian Subcorpus Google Books Ngram

Show simple item record

dc.contributor.author Bochkarev V.V.
dc.contributor.author Khristoforov S.V.
dc.contributor.author Shevlyakova A.V.
dc.date.accessioned 2021-02-25T06:51:04Z
dc.date.available 2021-02-25T06:51:04Z
dc.date.issued 2020
dc.identifier.issn 0302-9743
dc.identifier.uri https://dspace.kpfu.ru/xmlui/handle/net/161085
dc.description.abstract © 2020, Springer Nature Switzerland AG. This paper describes how to build a recognizer to identify named entities that occur in the Google Books Ngram corpus. In the previous studies, the text was usually input to the recognizer to solve the task of named entities recognition. In this paper, the decision is made based on the analysis of the word co-occurrence statistics. The recognizer is a neural network. A vector of frequencies of bigrams or syntactic bigrams including the studied word is fed at the input. The task is to recognize named entities denoted by one word. However, the proposed method can be further applied to recognize two- or multi-word named entities. The recognition error probability obtained on the test sample of 10 thousand words, which are free from homonymy, was 2.71% (F1-score is 0.963). Solving the problem of word classification in Google Books Ngram will allow one to create large dictionaries of named entities that will improve recognition quality of named entities in texts by existing algorithms.
dc.relation.ispartofseries Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.subject Google Books Ngram
dc.subject N-grams frequencies
dc.subject Named entities recognition
dc.subject Neural networks
dc.subject Syntactic bigrams
dc.title Recognition of Named Entities in the Russian Subcorpus Google Books Ngram
dc.type Conference Paper
dc.relation.ispartofseries-volume 12469 LNAI
dc.collection Публикации сотрудников КФУ
dc.relation.startpage 17
dc.source.id SCOPUS03029743-2020-12469-SID85092929783


Files in this item

This item appears in the following Collection(s)

  • Публикации сотрудников КФУ Scopus [24551]
    Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics