"topic:african-languages" — Search

Official DSFSI Public Datasets Registry - Comprehensive catalog of 50+ datasets for South African & African languages. Includes speech recognition, NLP, terminology, health, legal & financial data across HuggingFace, GitHub, Zenodo & more.

Jupyter Notebook65Updated 2 weeks ago

african-languagesdata-catalogdata-sciencedataset-registrydatasetshuggingface-datasetslow-resource-languagesmachine-learningmultilingualnatural-language-processingnlpopen-dataresearch-datasouth-africaspeech-recognition

Ijwi-ry-Ikirundi-AI/Kirundi_Dataset

🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.

Jupyter Notebook62Updated 5 days ago

african-languagesaiasrburundicommunity-drivenkirundilow-resource-languagemachine-learningmachine-translationnlpopen-datasetspeech-datasetspeech-recognitiontext-to-speechtts

Skywalker427/adinkra

Adinkra Symbols API - meanings of adinkra symbols, symbol images and synopsis around them

JavaScript61Updated 3 weeks ago

accraaccra-ghanaadinkraadinkra-symbolsafricaafrican-countriesafrican-cultureafrican-languagesasanteashanteashanticultureghanajavascriptkumasinodeapinodejssymbologysymbolswest-africa

dsfsi/PuoBERTa

A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.

Makefile50Updated 2 months ago

african-languagesafricannlpdsfsi-datasetsnlprocsetswanatntsn

rashiedomar/somali-wikipedia-corpus

Cleaned Somali Wikipedia corpus (~9,500 articles) for NLP, LLM training, and linguistic research

51Updated 2 months ago

african-languagescorpusdatasetllm-traininglow-resource-languagesnlpsomalitext-corpuswikipedia

uds-lsv/transfer-distant-transformer-african

Code + data for the EMNLP'20 publication "Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages"

Python55Updated 1 year ago

african-languageslow-resourcelow-resource-languagesnertopic-classificationtransformer-models

AfriNLP/AfriNLLB

AfriNLLB: Efficient Translation Models for African Languages

50Updated 4 days ago

african-languagesefficient-inferencemachine-translationnllb-200

JosephSefara/AfricanWordNet

AfricanWordNet: Implementation of WordNets for African languages. Citation paper "Practical Approach on Implementation of WordNets for South African Languages" https://www.aclweb.org/anthology/2021.gwc-1.3.pdf

HTML50Updated 3 years ago

african-languagesisixhosaisizulunltkpythonsepedisetswanatshivendawordnetwordnets

Niger-Volta-LTI/urhobo-asr-spoken-digits

URH-DIGITS is a connected digits speech recognition task

41Updated 1 year ago

african-languagesasrdigit-recognitionspeech-recognitionspoken-digits-recognitionurhobourhobo-spoken-digits

dsfsi/PuoData

Curated corpora for Setswana. Used to train PuoBERTa.

30Updated 6 months ago

african-languagesafrican-nlpcorporadsfsi-datasetsnatural-language-processingsetswanasouth-africatntsn

oyd11/fur-language

Fur language (poór'íŋ belé) [iso 639-3: fvr] resources, and computer aids.

30Updated 11 months ago

africaafrican-languagesfurkeyboardkeyboard-layoutmacosmacosxwindows

AidaLog/Plain-Swahili-Dataset

Plain swahili dastaset. Public sourced from public repositories

21Updated 2 years ago

african-languagesdatasetlanguage-classificationmachine-learningnlpopen-dataswahiliswahili-sentencesswahili-speaking

Page 1 of 3