Электронный архив

Chemical data visualization and analysis with incremental generative topographic mapping: Big data challenge

Показать сокращенную информацию

dc.contributor.author Gaspar H.
dc.contributor.author Baskin I.
dc.contributor.author Marcou G.
dc.contributor.author Horvath D.
dc.contributor.author Varnek A.
dc.date.accessioned 2018-09-18T20:23:20Z
dc.date.available 2018-09-18T20:23:20Z
dc.date.issued 2015
dc.identifier.issn 1549-9596
dc.identifier.uri https://dspace.kpfu.ru/xmlui/handle/net/139372
dc.description.abstract © 2014 American Chemical Society. This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases.
dc.relation.ispartofseries Journal of Chemical Information and Modeling
dc.title Chemical data visualization and analysis with incremental generative topographic mapping: Big data challenge
dc.type Article
dc.relation.ispartofseries-issue 1
dc.relation.ispartofseries-volume 55
dc.collection Публикации сотрудников КФУ
dc.relation.startpage 84
dc.source.id SCOPUS15499596-2015-55-1-SID84921689992


Файлы в этом документе

Данный элемент включен в следующие коллекции

  • Публикации сотрудников КФУ Scopus [24551]
    Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

Показать сокращенную информацию

Поиск в электронном архиве


Расширенный поиск

Просмотр

Моя учетная запись

Статистика