Ranking concrete and abstract words using Google Books Ngram data

Solovyev V.; Ivanov V.

Ranking concrete and abstract words using Google Books Ngram data

Ivanov V.; Solovyev V.

URI: https://dspace.kpfu.ru/xmlui/handle/net/162083

Дата: 2020

Аннотации:

© 2020 - IOS Press and the authors. All rights reserved. Creation of dictionaries of abstract and concrete words is a well-known task. Such dictionaries are important in several applications of text analysis and computational linguistics. Usually, the process of assembling of concreteness scores for words begins with a lot of manual work. However, the process can be automated significantly using information from large corpora. In this paper we combine two datasets: a dictionary with concreteness scores of 40,000 English words and the GoogleBooks Ngram dataset, in order to test the following hypothesis: in text concrete words tend to occur with more concrete words, than with abstract words (and inverse: abstract words tend to occur with more abstract words, than with concrete words). Using the hypothesis, we proposed a method for automatic evaluation concreteness scores of words using a small amount of initial markup.

Показать полную информацию

Файлы в этом документе

Имя: SCOPUS10641246-20 ...

Размер: 48.84Kb

Формат: PDF

Открыть

Данный элемент включен в следующие коллекции

Публикации сотрудников КФУ Scopus [24551]
Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.