Introducing baselines for Russian named entity recognition

Tkachenko M.; Solovyev V.; Ivanov V.; Simanovsky A.; Gareev R.

Introducing baselines for Russian named entity recognition

Gareev R.; Tkachenko M.; Solovyev V.; Simanovsky A.; Ivanov V.

URI: https://dspace.kpfu.ru/xmlui/handle/net/136885

Дата: 2013

Аннотации:

Current research efforts in Named Entity Recognition deal mostly with the English language. Even though the interest in multi-language Information Extraction is growing, there are only few works reporting results for the Russian language. This paper introduces quality baselines for the Russian NER task. We propose a corpus which was manually annotated with organization and person names. The main purpose of this corpus is to provide gold standard for evaluation. We implemented and evaluated two approaches to NER: knowledge-based and statistical. The first one comprises several components: dictionary matching, pattern matching and rule-based search of lexical representations of entity names within a document. We assembled a set of linguistic resources and evaluated their impact on performance. For the data-driven approach we utilized our implementation of a linear-chain CRF which uses a rich set of features. The performance of both systems is promising (62.17% and 75.05% F1 measure), although they do not employ morphological or syntactical analysis. © 2013 Springer-Verlag.

Показать полную информацию

Файлы в этом документе

Имя: SCOPUS03029743-20 ...

Размер: 44.38Kb

Формат: PDF

Открыть

Данный элемент включен в следующие коллекции

Публикации сотрудников КФУ Scopus [24551]
Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.