Dask-based efficient clustering of educational texts

Gafarov F.; Minullin D.; Gafarova V.

Dask-based efficient clustering of educational texts

Gafarov F.; Minullin D.; Gafarova V.

URI: https://dspace.kpfu.ru/xmlui/handle/net/170524

Date: 2021

Abstract:

Document clustering process is a long running and computationally demanding process. The need for systems that allow fast document clustering is especially relevant for processing large volumes of text data (Big Data). In this work we present a distributed text clustering framework based on Dask open source library for parallel and distributed computing. The Dask-based processing system developed in this work allows to execute all necessary operations related to the clustering of text documents in a parallel mode. We realized parallel agglomerative clustering algorithm of cosine similarity matrices computed from term frequency-inverse document frequency (TF-IDF) feature matrices of input texts. The system had been applied to intellectual analysis of educational data accumulated in the system”Electronic education of the Tatarstan Republic” from 2015 to 2020. Specially, by using developed system we clustered the text documents describing lesson planning, and also performed a comparative analysis of the average marks of students, whose training was carried out according to lesson planning belonging to different clusters.

Show full item record

Files in this item

Name: SCOPUS16130073-20 ...

Size: 55.13Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Публикации сотрудников КФУ Scopus [24551]
Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.