GitHunt
AL

alexsasu/NitroNLP-Hackathon-2023

Hackathon for an NLP task involving sexism classification

NitroNLP-Hackathon-2023

The hackathon (link; called NitroNLP, hosted in 2023) was for a multi-class classification task, consisting of classifying texts written in Romanian, from multiple sources: social media, web articles, books; into different types of sexism: sexist direct, sexist descriptive, sexist reporting, non-sexist offensive, and non-sexist non-offensive. The metric of interest was the weighted accuracy, given that the dataset was imbalanced.

Our approaches first consisted of trying classical machine learning methods, namely: Decision Tree, KNN, MLP; with the BoW representation, and, because these didn't bring us a satisfying weighted accuracy score, we moved on to a version of BERT called RoBERT, pre-trained on a Romanian corpus, which we then fine-tuned on our dataset and applied balanced weights to it.

Weighted accuracy scores of our models:

image

Our full documentation for this competition can be consulted in the "Paper.pdf" file.

Contributors:

Languages

Jupyter Notebook100.0%

Contributors

Created February 25, 2024
Updated February 25, 2024
alexsasu/NitroNLP-Hackathon-2023 | GitHunt