NitroNLP-Hackathon-2023

The hackathon (link; called NitroNLP, hosted in 2023) was for a multi-class classification task, consisting of classifying texts written in Romanian, from multiple sources: social media, web articles, books; into different types of sexism: sexist direct, sexist descriptive, sexist reporting, non-sexist offensive, and non-sexist non-offensive. The metric of interest was the weighted accuracy, given that the dataset was imbalanced.

Our approaches first consisted of trying classical machine learning methods, namely: Decision Tree, KNN, MLP; with the BoW representation, and, because these didn't bring us a satisfying weighted accuracy score, we moved on to a version of BERT called RoBERT, pre-trained on a Romanian corpus, which we then fine-tuned on our dataset and applied balanced weights to it.

Weighted accuracy scores of our models:

Our full documentation for this competition can be consulted in the "Paper.pdf" file.

Contributors:

Alexandru Rosca (https://github.com/RoscaAlexandru775/)
Alexandru Sasu (https://github.com/alexsasu/)
Andrei Dina (https://github.com/AndreiConstantinDina/)
Razvan Gogu (https://github.com/gogurazvan/)

alexsasu/NitroNLP-Hackathon-2023

NitroNLP-Hackathon-2023

Weighted accuracy scores of our models:

On this page

Languages

Contributors