Ukrainian company Grammarly, which develops tools for working with texts on the Internet, wants to create the first annotated GEC dataset in Ukrainian. These texts are necessary for the development of speech recognition systems, voice assistants, and grammar correction tools.

What it takes to create UA-GEC dataset

In order for the algorithms to “speak” in Ukrainian, Grammarly gathers user texts – these can be posts from social media, blogs, articles, essays, poems, and letters. Linguists will check the texts to correct stylistic and spelling errors.

“Ukrainian is a language with developed morphology. Unlike English, each word here has many word forms (книга, книгою, книгами). The NLP techniques, developed for English, will not always be optimum for the Ukrainian language. Searching for the best methods of working with such languages is a special task, and our dataset will be useful here,” the company explains.

How will this project contribute?

  • will improve the development of voice assistants and online systems for correcting grammar in Ukrainian;
  • will help to use the high-quality Ukrainian language on the Internet;
  • will increase the number of open tools for NLP learning of Ukrainian.

How to help

GEC dataset of the Ukrainian language will be published in free access. No financial remuneration for participation in its creation is provided, however, in this way any user may contribute to the development of the Ukrainian language online.

The gathering of the texts will last until September 13. You can send a ready-made text or write a new one here.