The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews

Miftahutdinov Z.; Alimova I.; Malykh V.; Nikolenko S.; Sakhovskiy A.; Tutubalina E.

dc.contributor.author	Tutubalina E.
dc.contributor.author	Alimova I.
dc.contributor.author	Miftahutdinov Z.
dc.contributor.author	Sakhovskiy A.
dc.contributor.author	Malykh V.
dc.contributor.author	Nikolenko S.
dc.date.accessioned	2022-02-09T20:36:59Z
dc.date.available	2022-02-09T20:36:59Z
dc.date.issued	2021
dc.identifier.issn	1367-4803
dc.identifier.uri	https://dspace.kpfu.ru/xmlui/handle/net/169387
dc.description.abstract	Motivation: Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient's health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews. Results: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labeled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multilabel sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data.
dc.relation.ispartofseries	Bioinformatics
dc.title	The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews
dc.type	Article
dc.relation.ispartofseries-issue	2
dc.relation.ispartofseries-volume	37
dc.collection	Публикации сотрудников КФУ
dc.relation.startpage	243
dc.source.id	SCOPUS13674803-2021-37-2-SID85099813248

Файлы в этом документе

Имя: SCOPUS13674803-20 ...

Размер: 48.61Kb

Формат: PDF

Открыть

Данный элемент включен в следующие коллекции

Публикации сотрудников КФУ Scopus [24551]
Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

Показать сокращенную информацию

Поиск в электронном архиве

Расширенный поиск

Просмотр

Весь электронный архив
Коллекция

Моя учетная запись

Статистика

Просмотр статистики использования

The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews

Файлы в этом документе

Данный элемент включен в следующие коллекции

Поиск в электронном архиве

Просмотр

Весь электронный архив

Коллекция

Моя учетная запись

Статистика