Электронный архив

The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews

Показать сокращенную информацию

dc.contributor.author Tutubalina E.
dc.contributor.author Alimova I.
dc.contributor.author Miftahutdinov Z.
dc.contributor.author Sakhovskiy A.
dc.contributor.author Malykh V.
dc.contributor.author Nikolenko S.
dc.date.accessioned 2022-02-09T20:36:59Z
dc.date.available 2022-02-09T20:36:59Z
dc.date.issued 2021
dc.identifier.issn 1367-4803
dc.identifier.uri https://dspace.kpfu.ru/xmlui/handle/net/169387
dc.description.abstract Motivation: Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient's health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews. Results: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labeled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multilabel sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data.
dc.relation.ispartofseries Bioinformatics
dc.title The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews
dc.type Article
dc.relation.ispartofseries-issue 2
dc.relation.ispartofseries-volume 37
dc.collection Публикации сотрудников КФУ
dc.relation.startpage 243
dc.source.id SCOPUS13674803-2021-37-2-SID85099813248


Файлы в этом документе

Данный элемент включен в следующие коллекции

  • Публикации сотрудников КФУ Scopus [24551]
    Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

Показать сокращенную информацию

Поиск в электронном архиве


Расширенный поиск

Просмотр

Моя учетная запись

Статистика