GitHunt

Greek Dialect Classifier

Binder

Putting an end to “It’s all Greek to me.”

A classifier that identifies Greek text as either Cypriot Greek (CG) or Standard Modern Greek (SMG)

For more information, you can read my thesis: A Classifier to Distinguish Between Cypriot Greek and Standard Modern Greek.

1. Notebooks

Index of Jupyter Notebooks
1. Obtaining CG and SMG tweets
Collecting the corpus
2. Data Analysis
Analyzing the corpus
3. Building the Classifier
Building the CG-SMG classifier

2. The corpus

The corpus can be found in the Data directory. It was collected by me personally and labeled into CG and SMG by separating the text into files.

Index of files in corpus
CG Facebook
CG text collected from Facebook posts and comments
CG Twitter
CG text collected from tweets
CG Other
CG text collected from forum posts, as well as comments on blogs and news articles
SMG Facebook
SMG text collected from Facebook posts and comments
SMG Twitter
SMG text collected from tweets
SMG Other
SMG text collected from forum posts, as well as comments on blogs and news articles

3. Instructions

To run the code, you can clone the repository, install the dependencies, and run the Jupyter notebooks on your local machine, or click the Binder badge at the top of this README to run the notebooks on a remote server.

4. Trying the classifier

If you want to run the classifier with your own input, go to the last section of 3. Building the Classifier.

Languages

Jupyter Notebook100.0%

Contributors

MIT License
Created March 29, 2018
Updated March 10, 2026