General Regionally Annotated Corpus of Ukrainian (GRAC)External link is a large representative collection of texts in Ukrainian accompanied by a program that enables customization of subcorpora, searching words, grammatical forms and their combinations as well as post-processing of the query results. The query can be sorted, balanced samples can be extracted and different statistical information collected.
The corpus is meant for linguistic research in grammar, lexicon, history of written Ukrainian as well as for use in preparing dictionaries and grammars.
The corpus can be used for advanced study of the language as well as for writing textbooks, learner’s dictionaries and exercises using examples from real texts, taking into consideration the frequencies, collocations etc.
The corpus does not represent the standard normalized language; it includes words and word combinations that do not belong to the present-day standard norm.
The corpus encompasses the timespan between 1816 and 2021 and includes more than 90 thousand texts by about 26 thousand authors.
Please cite GRAC
Maria Shvedova, Ruprecht von Waldenfels, Sergiy Yarygin, Andriy Rysin, Vasyl Starko, Tymofij Nikolajenko et al. (2017-2022): GRAC: General Regionally Annotated Corpus of Ukrainian. Electronic resource: Kyiv, Lviv, Jena. Available at uacorpus.org.