Kazan Federal University Digital Repository

Dask-based efficient clustering of educational texts

Show simple item record

dc.contributor.author Gafarov F.
dc.contributor.author Minullin D.
dc.contributor.author Gafarova V.
dc.date.accessioned 2022-02-10T20:40:00Z
dc.date.available 2022-02-10T20:40:00Z
dc.date.issued 2021
dc.identifier.issn 1613-0073
dc.identifier.uri https://dspace.kpfu.ru/xmlui/handle/net/170524
dc.description.abstract Document clustering process is a long running and computationally demanding process. The need for systems that allow fast document clustering is especially relevant for processing large volumes of text data (Big Data). In this work we present a distributed text clustering framework based on Dask open source library for parallel and distributed computing. The Dask-based processing system developed in this work allows to execute all necessary operations related to the clustering of text documents in a parallel mode. We realized parallel agglomerative clustering algorithm of cosine similarity matrices computed from term frequency-inverse document frequency (TF-IDF) feature matrices of input texts. The system had been applied to intellectual analysis of educational data accumulated in the system”Electronic education of the Tatarstan Republic” from 2015 to 2020. Specially, by using developed system we clustered the text documents describing lesson planning, and also performed a comparative analysis of the average marks of students, whose training was carried out according to lesson planning belonging to different clusters.
dc.relation.ispartofseries CEUR Workshop Proceedings
dc.subject ANOVA
dc.subject Big data
dc.subject Dask
dc.subject Document clustering
dc.subject Educational data mining
dc.subject Python
dc.subject TF-IDF
dc.title Dask-based efficient clustering of educational texts
dc.type Conference Proceeding
dc.relation.ispartofseries-volume 3036
dc.collection Публикации сотрудников КФУ
dc.relation.startpage 362
dc.source.id SCOPUS16130073-2021-3036-SID85121268811


Files in this item

This item appears in the following Collection(s)

  • Публикации сотрудников КФУ Scopus [24551]
    Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics