Google books ngram: Problems of representativeness and data reliability

dc.contributor.author Solovyev V.D.
dc.contributor.author Bochkarev V.V.
dc.contributor.author Akhtyamova S.S.
dc.date.accessioned 2021-02-25T06:54:14Z
dc.date.available 2021-02-25T06:54:14Z
dc.date.issued 2020
dc.identifier.issn 1865-0929
dc.identifier.uri https://dspace.kpfu.ru/xmlui/handle/net/161410
dc.description.abstract © Springer Nature Switzerland AG 2020. The article discusses representativeness of Google Books Ngram as a multi-purpose corpus. Criticism of the corpus is analysed and discussed. A comparative study of the GBN data and the data obtained using the Russian National Corpus and the General Internet Corpus of Russian is performed to show that the Google Books Ngram corpus can be successfully used for corpus-based studies. A new concept “diachronically balanced corpus” is introduced. Besides, the article describes the problems of word spelling and metadata errors presented in the GBN corpus and proposes possible ways of improving quality of the GBN data.
dc.relation.ispartofseries Communications in Computer and Information Science
dc.subject Corpus representativeness
dc.subject Google Books Ngram
dc.subject Text corpora
dc.subject Word frequency
dc.title Google books ngram: Problems of representativeness and data reliability
dc.type Conference Paper
dc.relation.ispartofseries-volume 1223 CCIS
dc.collection Публикации сотрудников КФУ
dc.relation.startpage 147
dc.source.id SCOPUS18650929-2020-1223-SID85088753247

  Публикации сотрудников КФУ Scopus [24551]
    Коллекция содержит публикации сотрудников Казанского федерального (до 2010 года Казанского государственного) университета, проиндексированные в БД Scopus, начиная с 1970г.

