Abstract:
© 2018, Springer Science+Business Media, LLC, part of Springer Nature. In this paper, support vector machine and condensed graph of reaction (CGR) approaches have been used to predict the regioselectivity of aromatic hydroxylation for human CYP1A2 substrates. Experimental data on aromatic hydroxylation for human cytochrome CYP1A2 (observed molecular or “real” transformations) used in the modeling were extracted from the Metabolite database and the XenoSite database. In addition, all potential but unobserved (“unreal”) transformations were generated. The dataset containing “real” and “unreal” transformations was converted into an ensemble of CGRs representing pseudomolecules with conventional (single, double, aromatic, etc.) bonds and dynamic bonds characterizing chemical transformations. ISIDA fragment descriptors generated for CGRs were used for the modeling. The models have been validated in three times repeated fivefold cross-validation on the training set and then on an external set. The final model was constructed by consensus over models built on different descriptors sets. Predictive performance of our model on the external test set was similar to that of XenoSite and Way2Drug tools. Unlike previously used atom labeling-based approaches, the proposed CGR-based representation of metabolic transformations could be applied to different types of reactions catalyzed by the same enzyme and therefore, it is more suitable for automatized handling of metabolic data.