Georgian Language Corpus
The Georgian Language Corpus (GLC) is a megaproject of the Institute of Linguistic Studies of Ilia State University; it was created in 2009-2015. At this stage, the corpus contains more than 100 000 000 word-forms and consists of the monolingual and bilingual sub-corpora. The monolingual sub-corpus comprises Old and Middle Georgian and New and Modern Georgian sections.
The Old and Middle Georgian section is structured according to the relations with other languages and cultures (Greek, Syrian, Christian Arabic or Armenian), as well as translation and literary schools (pre- Athonite, Athonite, Antiochian, etc.).
The bilingual sub-corpus now includes the parallel corpus of Vepkhistkaosani (The Knight in the Panther’s Skin; Georgian-English) and Kartlis Tskhovreba (The Georgian Chronicles; Georgian-Armenian). At present, the Georgian-Armenian parallel corpus of The Martyrdom of the Holy Queen Shushanik and Georgian-Syriac corpus of The Life of the Peter the Iberian are being developed.
During 2012-2014, a morphological analyzer of Modern Georgian was created. At present, linguistic and technological standards for the modelling of Old and Middle Georgian are being developed.