32 results for “topic:dialect-identification”
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)
This repository contains the Arabic sarcasm dataset (ArSarcasm)
Dialect identification using Siamese network
The first Dialectal Arabic Code Switching - DACS corpus from broadcast speech. Annotated at the token-level, considering both the linguistic and the acoustic cues. This dataset is a potential benchmark for DCS in spontaneous speech.
ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which is a part of WANLP 2021.
Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)
Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek
VarDial19 shared task: Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT)
A tool that predicts the dialect of English of an SMS message using recurrent neural networks supplemented with data from Google Trends.
No description provided.
Dialect-aware grapheme-to-phoneme conversion for German using Transformer + XLM-R. Context-aware, multi-dialect support with CTC+CE training. Built with PyTorch Lightning & Hydra.
Ríomhchlár a dhéanann aicmiú staitistiúil ar théacsanna Gaeilge de réir a gcanúint
Arabic_Dialect_Identification_NLP-AIM-Task
using AraBert to classify different Arabic dialects. ranked fourth in WANLP2020 workshop.
Twitter Dialect Datasets and Classifiers (GULF Arabic Corpus)
An Arabic Tweet Dialect Classifier
Chinese dialect identification using audio embeddings from LLMs.
An atlas of Central Kurdish dialects + a simple game to detect dialects
Twitter Dialect Datasets and Classifiers (EG + GULF Arabic Corpus)
This shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project.
Twitter Dialect Datasets and Classifiers (EG Arabic Corpus)
Dialect Identification in Indic Languages
No description provided.
Web interface for far-speech demo to be present in INTERSPEECH 2019
ITDI shared task @ VarDial2022 9th Workshop on NLP for Similar Languages, Varieties and Dialects.
log MFSC based classification of British English dialects from the IViE(Intonational Variation in English) corpus dataset
Arabic Dialect Identification on NADI 2020 and QADI datasets
[Interspeech19] Computational Paralinguistics ChallengE (ComParE)
Binary dialect classification: Standard vs Kathiawadi Gujarati.