dc.contributor.author |
Bolshina A.S. |
|
dc.contributor.author |
Loukachevitch N.V. |
|
dc.date.accessioned |
2021-02-25T06:56:04Z |
|
dc.date.available |
2021-02-25T06:56:04Z |
|
dc.date.issued |
2020 |
|
dc.identifier.issn |
2221-7932 |
|
dc.identifier.uri |
https://dspace.kpfu.ru/xmlui/handle/net/161572 |
|
dc.description.abstract |
© 2020 ABBYY PRODUCTION LLC. All rights reserved. The best approaches in Word Sense Disambiguation (WSD) are supervised and rely on large amounts of hand-labelled data, which is not always available and costly to create. For the Russian language there is no sense-tagged resource of the size sufficient to train supervised word sense disambiguation algorithms. In our work we describe an approach that is used to create an automatically labelled collection based on the monosemous relatives (related unambiguous entries). The main contribution of our work is that we extracted monosemous relatives that can be located at relatively long distances from a target ambiguous word and ranked them according to the similarity measure to the target sense. The selected candidates are then used to extract training samples from the news corpus. We evaluated word sense disambiguation models based on a nearest neighbor classification on BERT and ELMo embeddings. Our work relies on the Russian wordnet RuWordNet. |
|
dc.relation.ispartofseries |
Komp'juternaja Lingvistika i Intellektual'nye Tehnologii |
|
dc.subject |
Automatic Dataset Collection |
|
dc.subject |
BERT |
|
dc.subject |
ELMo |
|
dc.subject |
Monosemous relatives |
|
dc.subject |
Russian dataset |
|
dc.subject |
Word sense disambiguation |
|
dc.title |
Generating training data for word sense disambiguation in Russian |
|
dc.type |
Conference Paper |
|
dc.relation.ispartofseries-issue |
19 |
|
dc.relation.ispartofseries-volume |
2020-June |
|
dc.collection |
Публикации сотрудников КФУ |
|
dc.relation.startpage |
119 |
|
dc.source.id |
SCOPUS22217932-2020-2020-19-SID85093820211 |
|