Электронный архив

Corpus management system: Semantic aspects of representation and processing of search queries

Показать сокращенную информацию

dc.contributor.author Nevzorova O.
dc.contributor.author Mukhamedshin D.
dc.contributor.author Galieva A.
dc.contributor.author Gataullin R.
dc.date.accessioned 2018-04-05T07:10:26Z
dc.date.available 2018-04-05T07:10:26Z
dc.date.issued 2017
dc.identifier.uri http://dspace.kpfu.ru/xmlui/handle/net/130450
dc.description.abstract © 2016 IEEE. There are several well-known corpus management systems (Sketch Engine, Manatee, EXMARaLDA, etc.). The system presented in this article has search functionalities comparable to those. However, it also takes into account certain specifics of Turkic languages. The Tatar corpus management system (http://corpus.antat.ru) is specifically designed to work with Turkic linguistic corpora. Functionality offered by the corpus management system includes search of lexical units, morphological and lexical search, search of syntactic units, search of the n-gram based on grammar and others. The semantic model of the Tatar language data representation is the core of the system. The search is performed using open source tools (database management system MariaDB, Redis data store). The Tatar language has a complicated agglutinative morphology; and we consider the system of grammatical categories represented in grammatical annotation of the Tatar corpus as a key to semantics of the language. Selecting and combining grammatical, lexical and other parameters of a query, we may get certain sets of semantic samples from semantically unstructured corpus data. The main task of our research is detecting and describing a class of grammatically conditioned semantic phenomena and developing a system of queries to the corpus for extraction of these semantic phenomena. Experiments with queries to the Tatar corpus show that semantically relevant combinations of query parameters may differ by level of complexity. The results of the work may be used for document clustering and classification, as well as for Tatar grammar building and other purposes.
dc.subject Corpus manager
dc.subject grammar
dc.subject morphological formulas
dc.subject query
dc.subject semantic information
dc.subject the Tatar language
dc.title Corpus management system: Semantic aspects of representation and processing of search queries
dc.type Conference Paper
dc.collection Публикации сотрудников КФУ
dc.relation.startpage 285
dc.source.id SCOPUS-2017-SID85021442807


Файлы в этом документе

Данный элемент включен в следующие коллекции

  • Публикации сотрудников КФУ Scopus [24551]
    Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

Показать сокращенную информацию

Поиск в электронном архиве


Расширенный поиск

Просмотр

Моя учетная запись

Статистика