Greek Dialect Classifier

Putting an end to “It’s all Greek to me.”

A classifier that identifies Greek text as either Cypriot Greek (CG) or Standard Modern Greek (SMG)

For more information, you can read my thesis: A Classifier to Distinguish Between Cypriot Greek and Standard Modern Greek.

1. Notebooks

Index of Jupyter Notebooks
1. Obtaining CG and SMG tweets Collecting the corpus
2. Data Analysis Analyzing the corpus
3. Building the Classifier Building the CG-SMG classifier

2. The corpus

The corpus can be found in the Data directory. It was collected by me personally and labeled into CG and SMG by separating the text into files.

Index of files in corpus
CG Facebook CG text collected from Facebook posts and comments
CG Twitter CG text collected from tweets
CG Other CG text collected from forum posts, as well as comments on blogs and news articles
SMG Facebook SMG text collected from Facebook posts and comments
SMG Twitter SMG text collected from tweets
SMG Other SMG text collected from forum posts, as well as comments on blogs and news articles

3. Instructions

To run the code, you can clone the repository, install the dependencies, and run the Jupyter notebooks on your local machine, or click the Binder badge at the top of this README to run the notebooks on a remote server.

4. Trying the classifier

If you want to run the classifier with your own input, go to the last section of 3. Building the Classifier.

hb20007/greek-dialect-classifier

Greek Dialect Classifier

1. Notebooks

2. The corpus

3. Instructions

4. Trying the classifier

On this page

Languages

Contributors