63 results for “topic:african-languages”
Machine Translation for Africa
A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.
Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.
Yorùbá language training text for NLP, ASR and TTS tasks
SemEval2024-task 11: Bridging the Gap in Text-Based Emotion Detection
AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/
Masakhane Web is a translation web application for solely African Languages.
AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.
Automatic Diacritic Restoration of Yorùbá language Text
First comprehensive survey of NLP work carried out in Senegalese languages covering various tasks + Applications in the social sciences.
Cross-lingual Language Model (XLM) pretraining and Model-Agnostic Meta-Learning (MAML) for fast adaptation of deep networks
Ìrànlọ́wọ́ is a utility library for analysis & (pre)processing of Yorùbá text → https://pypi.org/project/iranlowo
stoplists for African languages generated from the ASP corpus
Introduction to "Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages".
Website that hosts the African Voices projects. Users can download datasets and synthesizers, and synthesize speech in African languages
A Python based utility for Swahili and English number to words conversion.
This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.
Sankofa Display is a typeface that draws inspiration from African art styles, with a focus on straight-line geometric designs.
Official DSFSI Public Datasets Registry - Comprehensive catalog of 50+ datasets for South African & African languages. Includes speech recognition, NLP, terminology, health, legal & financial data across HuggingFace, GitHub, Zenodo & more.
🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.
Adinkra Symbols API - meanings of adinkra symbols, symbol images and synopsis around them
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
Cleaned Somali Wikipedia corpus (~9,500 articles) for NLP, LLM training, and linguistic research
Code + data for the EMNLP'20 publication "Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages"
AfriNLLB: Efficient Translation Models for African Languages
AfricanWordNet: Implementation of WordNets for African languages. Citation paper "Practical Approach on Implementation of WordNets for South African Languages" https://www.aclweb.org/anthology/2021.gwc-1.3.pdf
URH-DIGITS is a connected digits speech recognition task
Curated corpora for Setswana. Used to train PuoBERTa.
Fur language (poór'íŋ belé) [iso 639-3: fvr] resources, and computer aids.
Plain swahili dastaset. Public sourced from public repositories