Corpus management system: Semantic aspects of representation and processing of search queries

Nevzorova O.; Galieva A.; Gataullin R.; Mukhamedshin D.

dc.contributor.author	Nevzorova O.
dc.contributor.author	Mukhamedshin D.
dc.contributor.author	Galieva A.
dc.contributor.author	Gataullin R.
dc.date.accessioned	2018-04-05T07:10:26Z
dc.date.available	2018-04-05T07:10:26Z
dc.date.issued	2017
dc.identifier.uri	http://dspace.kpfu.ru/xmlui/handle/net/130450
dc.description.abstract	© 2016 IEEE. There are several well-known corpus management systems (Sketch Engine, Manatee, EXMARaLDA, etc.). The system presented in this article has search functionalities comparable to those. However, it also takes into account certain specifics of Turkic languages. The Tatar corpus management system (http://corpus.antat.ru) is specifically designed to work with Turkic linguistic corpora. Functionality offered by the corpus management system includes search of lexical units, morphological and lexical search, search of syntactic units, search of the n-gram based on grammar and others. The semantic model of the Tatar language data representation is the core of the system. The search is performed using open source tools (database management system MariaDB, Redis data store). The Tatar language has a complicated agglutinative morphology; and we consider the system of grammatical categories represented in grammatical annotation of the Tatar corpus as a key to semantics of the language. Selecting and combining grammatical, lexical and other parameters of a query, we may get certain sets of semantic samples from semantically unstructured corpus data. The main task of our research is detecting and describing a class of grammatically conditioned semantic phenomena and developing a system of queries to the corpus for extraction of these semantic phenomena. Experiments with queries to the Tatar corpus show that semantically relevant combinations of query parameters may differ by level of complexity. The results of the work may be used for document clustering and classification, as well as for Tatar grammar building and other purposes.
dc.subject	Corpus manager
dc.subject	grammar
dc.subject	morphological formulas
dc.subject	query
dc.subject	semantic information
dc.subject	the Tatar language
dc.title	Corpus management system: Semantic aspects of representation and processing of search queries
dc.type	Conference Paper
dc.collection	Публикации сотрудников КФУ
dc.relation.startpage	285
dc.source.id	SCOPUS-2017-SID85021442807

Files in this item

Name: SCOPUS-2017-SID85 ...

Size: 48.99Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Публикации сотрудников КФУ Scopus [24551]
Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

Show simple item record

Search DSpace

Advanced Search

Browse

All of Kazan Federal University Digital Repository
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Statistics

View Usage Statistics

Corpus management system: Semantic aspects of representation and processing of search queries

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of Kazan Federal University Digital Repository

This Collection

My Account

Statistics