pemagrg1/Nepali-Datasets
A list of Nepali Dataset sources. (Hoping that it will encourage everyone to research more on Nepali language)
Nepali-Datasets: Comprehensive NLP Resource Collection
A thoroughly verified and curated collection of Nepali datasets for NLP research, development, and benchmarking. This resource aggregates 100+ datasets across 20+ categories to encourage and support research on low-resource Nepali language.
NOTE: Hope that this will encourage everyone to research more on Nepali language. And you are welcome to add the sources if its not listed here π
Benchmarks & Standards
Comprehensive evaluation frameworks and shared tasks for Nepali NLP.
- NLUE (Nepali Language Understanding Evaluation) β - 9 classification + 3 structural prediction tasks (sentiment, hate speech, toxicity, QA, NER). arXiv: 2411.19244
- Nep-gLUE Benchmark - Official Nepali GLUE-style benchmark (7 NLU tasks). Limited direct access; see NLUE for comprehensive alternatives.
- FLORES-101 Evaluation Benchmark - Machine translation evaluation across 101 languages including Nepali. GitHub: facebookresearch/flores
- IndicBench - Benchmark for 11 Indic languages including Nepali (13 tasks). New 2025 addition.
- SemEval 2026 Task 9 - Polarization type classification with Nepali data. Codabench New 2026.
Nepali Text Corpus
Large-scale text collections for language modeling, pre-training, and linguistic analysis.
Ultra-Large Corpora (>1GB)
-
Nepali-Text-Corpus (IRIISNEPAL) β - 6.4M articles, 10.1 GB - Largest Nepali corpus from 99 news websites. State-of-the-art pre-training resource. HF: IRIISNEPAL/nepali-text-corpus | arXiv: 2411.15734
-
OSCAR Corpus Nepali β - 3.8 GB, 100M+ sentences from Common Crawl. Kaggle: hsebarp/oscar-corpus-nepali
-
CC100-Nepali β - Common Crawl 2019 subset, 200GB uncompressed. Foundation data for multilingual models. MetaText: cc100-nepali
-
Lamsal (2020) Corpus - 12M+ words professionally compiled. Note: Original DOI 404; consider IRIISNEPAL as primary substitute.
Large Curated Collections (100MB-1GB)
-
Nepali News Dataset β - 6,800+ articles with metadata. Kaggle: lotusacharya/nepalinewsdataset
-
Nepali Wikipedia Articles β - 39,000+ articles from Wikipedia dump. Kaggle: disisbig/nepali-wikipedia-articles
-
np20ng (20 Newsgroup) β - 200,000+ news documents across 20 categories. Adapted from English 20NG. HF: Suyogyart/np20ng New addition.
-
Nepali News Dataset (Large) β - 25,000+ articles across 10+ categories. Kaggle: ashokpant/nepali-news-dataset-large
Specialized Text Collections
-
Nepali Unigrams Cleaned (FineWeb) β - 200k+ unique Nepali words with frequency. Kaggle: thenepaliguy/nepali-unigrams-cleaned
-
Setopati News Dataset β - 10,000+ articles from Setopati portal. News domain-specific. Kaggle: living0world/setopati-news-dataset
-
Nepali Raw Text Data β - Raw text batches for preprocessing. Kaggle: rajanghimire/nepali-raw-text-data-batch1
-
Nepali Lyrics Dataset β - 5,000+ song lyrics with metadata. Music domain. Kaggle: sanjay05kc/nepali-lyrics
-
Digitized Nepali Textbooks β - OCR'd school textbooks (formal register). HF: dineshkarki/nepali-textbooks-corpus
Classification Datasets
News classification, topic modeling, and text categorization.
-
iNLTK Nepali News Dataset β - 8,000+ articles across 5 categories. Kaggle: disisbig/nepali-news-dataset
-
16NepaliNews Corpus β - ~14,364 documents across 16 categories. Most comprehensive category coverage. GitHub: sndsabin/Nepali-News-Classifier
-
Nepali News Datasets (Small) β - 3,000+ articles. Good for quick prototyping. Kaggle: tejshahi/20nepalinews
-
Prasta Dataset β - Question type classification for QA systems. Kaggle: sangamthapa/prasta
-
Nepali Factoid Questions Intent Classified - 500+ samples for intent detection. Kaggle: sushiltimilsina/nepali-factoid-questions-intent-classified-dataset
Named Entity Recognition (NER) Datasets
Annotated datasets for entity recognition (person, organization, location, etc.).
-
EverestNER β - 50,000+ annotated sentences, 8 entity types. Largest NER dataset. Named after Mt. Everest. Kaggle: jeevanchapagain/everestner
-
DanfeNER β - 25,000+ sentences covering Nepali geographical & cultural entities. Kaggle: jeevanchapagain/danfener
-
Nepali NER (Ebiquity v2) β - Benchmark dataset with 3 entity types (PER, ORG, LOC). GitHub: oya163/nepali-ner/data/ebiquity_v2
-
Nepali NER Dataset (dadelani) β - Annotated for multi-token entities. GitHub: dadelani/nepali-ner New addition.
-
Nepali Offensive Language NER and Sentiment - 5,000+ samples with dual annotations (NER + sentiment). Kaggle: merishnasuwal/offensive-language-ner-and-sentiment-analysis-data
Sentiment Analysis & Hate Speech Datasets
Social media, news, and online text with sentiment/toxicity annotations.
Sentiment Analysis
-
NepaliSentiment β - GitHub corpus with preprocessing & baselines. GitHub: rockerritesh/NepaliSentiment
-
Nepali Sentiment Analysis β - Binary classification (positive/negative). Updated link. Kaggle: aayamoza/nepali-sentiment-analysis
-
Nepali Language Sentiment Analysis - Movie Reviews β - 2,500+ reviews with star ratings. Domain-specific (film). Kaggle: shikharghimire/nepali-language-sentiment-analysis-movie-reviews
-
Nepali Luxury Hotel Reviews β - 4,000+ reviews with aspect-based sentiment. Hotel domain. Kaggle: suprapandey/nepali-luxury-hotel-reviews-2024
-
XLSum-Nepali β - Summarization + sentiment. HF: sanjeev-bhandari01/XLSum-nepali New.
Hate Speech & Offensive Language
-
Nepali Hate Speech Collection β - 5,000+ annotated samples from social media. Kaggle: mohanbhandari/nepali-hate-speech-collection
-
Nepali Offensive Language Detection and Sentiment Analysis β - Offensive language detection tooling. GitHub: merishnaSuwal/nep-off-langdetect New.
-
Nepali Abusive Language NER and Sentiment Analysis β - Multi-task dataset (NER + sentiment on abusive text). Kaggle: merishnasuwal/offensive-language-ner-and-sentiment-analysis-data
-
NepCov19Tweets β - 10,000+ COVID-19 tweets with emotion labels. Social media (Twitter). Kaggle: mathew11111/nepcov19tweets
-
Mpox Instagram Sentiment and Hate Analysis β - 3,000+ Instagram posts with dual sentiment + hate labels. Health + social media. Kaggle: thakurnirmalya/mpox-instagram-dataset-sentiment-and-hate-analysis
Question Answering (QA) Datasets
Extractive, generative, and domain-specific QA datasets.
-
Nepali Health Q&A Corpus β - 3,000+ Q&A pairs from health forums (medical domain). Kaggle: thedevastator/nepali-health-q-a-corpus
-
Pregnancy Related Question Answer β - 1,500+ pairs on maternal health (specialty medical). Kaggle: poudelsujan03/pregnancy-related-question-answer-nepali-dataset
-
Nepali Health Forum Corpus β - 2,500+ Q&A from health forums with user interactions. Kaggle: rxnach/nepali-health-forum-corpus-questions-and-answers
-
Nepali QA Dataset (Yunika) β - 266 extractive QA pairs with passage context. HuggingFace format. HF: Yunika/Nepali-QA
Summarization Datasets
Abstractive & extractive summarization, headline generation.
-
Nepali text summarization β - 1,000+ document-summary pairs. Abstractive task. Kaggle: imageinfo/nepali-text-summarization
-
Nepali News Article with Summary β - 286,000+ news headlines + articles. Largest summarization resource (headline generation). Kaggle: adarsh203/nepali-news-article-with-summary
-
Sentence Compression Nepali β - 5,000+ sentence pairs for text compression (extractive). Kaggle: sbastola73/sentence-compression-nepali
-
Policy Documents and Summaries β - 500+ policy documents with professional summaries (domain-specific). Kaggle: greenspaghetti/policy-documents-and-summaries
Speech Datasets (ASR & TTS)
Audio data for automatic speech recognition and text-to-speech synthesis.
Large-Scale ASR
-
OpenSLR-54 (Large Nepali ASR) β - 157,000 utterances, 400+ hours. Google-supported, professional quality. openslr.org/54
-
Mozilla Common Voice (Nepali) β - Crowdsourced speech, 100k+ clips available. Diverse speakers. commonvoice.mozilla.org/en/datasets Note: Direct Nepali link may require navigation; main site confirms availability.
-
Nepali Speech to Text Dataset (Parliamentary) β - 1,000+ utterances from Parliament sessions (formal speech). Kaggle: ishworsubedii/nepali-speech-to-text-dataset
-
Nepali Automatic Speech Recognition (HF) β - Combined ASR dataset for transcription. HF: amitpant7/Nepali-Automatic-Speech-Recognition New.
-
ASR Nepali 1 Large β - 50,000+ audio files with transcriptions. Kaggle: sonismaharjan/asr-nepali-1-large
TTS & Synthesized Speech
-
OpenSLR-43 (High quality TTS) β - High-quality single-speaker TTS data. Professional recording. openslr.org/43
-
Nepali Singing Voice Data β - Audio + lyrics for singing voice synthesis (music domain). Kaggle: pujancozu/nepali-singing-voice-data
Speech Analysis & Emotion
-
Nepali Speech Emotion Detection β - 3,000+ speech samples with 6 emotion labels. Kaggle: ashalupreti/nepali-speech-emotion-detection-dataset
-
Newari Music Classification β - Audio classification for Newari (related language) music. Kaggle: pujancozu/newari-music
Multilingual Benchmarks
- Google FLEURS β - Multilingual benchmark including Nepali (101 languages). HF: google/fleurs
Image & Video Datasets (Computer Vision)
Datasets for image/video captioning, object detection, and multimodal learning.
Sign Language & Gesture
-
Nepali Sign Language Character Dataset β - 36 characters Γ 1,000 images = 36,000 total. Sign language recognition. Kaggle: biratpoudelrocks/nepali-sign-language-character-dataset
-
Nepali Sign Language Video Dataset (Zenodo) β - 630 professional videos (1,205 gestures with frame annotations). Research-grade. Zenodo: 10478554
Image Captioning & Multimodal
-
Flickr8k Nepali Captioning β - 8,000 images Γ 5 Nepali captions = 40,000 captions. Adapted from Flickr8k English. GitHub: bipeshrajsubedi/Flickr8k_Nepali_Dataset
-
Nepali Video Captioning (MSVD) β - 1,500+ videos with Nepali descriptions. Video captioning task. Kaggle: kabitaparajuli/video-captioning-in-nepali-msvd-dataset
Face Recognition & Emotion
-
Nepali Celeb Localized Face Dataset β - 500+ Nepali celebrities with face bounding boxes. Face detection & recognition. GitHub: amitpant7/Nepali-Celeb-Localized-Face-Dataset
-
Facial Emotion Detection for Nepali Ethnic Groups β - 6,000+ facial images with 7 emotion labels. Culturally-specific dataset. Kaggle: suchanasubedi/facial-emotion-detection-for-nepali-ethnic-groups
Domain-Specific Objects
-
Nepali Currency Dataset β - 5,000+ currency note images. Banknote denomination classification. Kaggle: uashutoshk/nepali-currency-dataset
-
Nepali Food Images β - 3,000+ images of traditional Nepali dishes. Food recognition domain. Kaggle: saurabkunwar/nepali-food-images
-
Nepali Cultural Dress and Ornaments β - 2,000+ images of traditional clothing & artifacts. Cultural heritage. Kaggle: bimarshakhanal/nepali-cultural-dress-and-ornaments
OCR & Handwriting Datasets
Character recognition, document digitization, and license plate detection.
Handwriting & Character Recognition
-
Nepali Handwriting Characters β - Handwritten character images for OCR training. Kaggle: mohanbhandari/nepali-handwriting-characters
-
Handwritten Devanagari Character Dataset β - 10,500+ images of Devanagari script (applicable to Nepali). Kaggle: sa9arr/handwritten-devanagari-character-dataset
-
Nepali Handwritten Images for Text Detection β - Document-level handwritten images for text detection. Kaggle: sweekardahal/nepali-handwritten-images-for-text-detection
License Plate & Vehicle Recognition
-
Nepali License Plate (ALPR) V2 β - 2,000+ license plate images for automatic license plate recognition. Kaggle: ishworsubedii/alpr-v2
-
Nepali Motorbike Backplate Labeled β - 1,500+ motorcycle plate images with bounding boxes. Kaggle: saugat111/nepali-moterbike-backplate-lbled
Academic OCR Research
-
Nepali Handwritten Character Recognition (Zenodo) β - Research dataset with detailed annotations. Zenodo: 7472398
-
Improving Tesseract-OCR for Nepali (Zenodo) β - 5,000+ images with preprocessing techniques (DOI: 10.5281/zenodo.4361896). Zenodo: 4361896
Translation Datasets
Parallel corpora for machine translation and low-resource language pairs.
Large-Scale Parallel Corpora
-
English-Nepali Parallel Corpus (Kathmandu University) β - 1,800,000 sentence pairs gold standard for EN-NE MT. Largest parallel resource. ELRA: W0077
-
Kathmandu University English-Nepali Corpus β - 1.8M sentence pairs (direct source confirmation). AI4Bharat: indicnlp_catalog
Medium-Scale Corpora
-
Nepali-English language pair β - 40,000+ parallel sentence pairs with preprocessing code. GitHub: sharad461/nepali-translator
-
Hindi-Nepali Parallel Corpus (Noisy) β - 500,000+ sentence pairs (unfiltered). Kaggle: thenepaliguy/final-hi-ne
-
Hindi-Nepali Evaluation Corpus (Clean) β - 50,000+ high-quality sentence pairs (manually validated). Kaggle: thenepaliguy/cleanhindinepali
-
Urdu-Nepali Parallel Corpus β - 100,000+ sentence pairs. Underrepresented language pair. Kaggle: rtatman/urdunepali-parallel-corpus
Multilingual & Specialized
-
Trilingual Hindi-English-Nepali β - 200,000+ aligned triples. Multilingual MT resource. Kaggle: sundeepdawadi/cleaned-word2word-en-hi-ne
-
English-Nepali Translation (HF) β - Instruction-tuned format for LLM fine-tuning. HF: ashokpoudel/nepali-english-translation-dataset
-
Bidirectional English-Nepali MT for Legal Domain β - 125,000 legal sentences. Domain-specific (legal). ACL: 2024.sigul-1.7 New 2024.
-
CLE Parallel Corpus (AI4Bharat) β - English-Nepali-Urdu triplets. Multilingual training. GitHub: AI4Bharat/indicnlp_catalog
Historical & Shared Tasks
-
WMT19 Parallel Corpus β - Shared task corpus with filtering challenge. statmt.org/wmt19
-
English - Nepali translated strings - UI/software localization strings. Note: Original link 503; alternative via TDIL-DC not directβuse ELRA above.
Word Embeddings & Pre-trained Models
Pre-computed word vectors and language models with training datasets.
Word Embeddings
-
Nepali Word2Vec from scratch β - Custom-trained 300D vectors with training scripts. Educational resource. GitHub: R4j4n/Nepali-Word2Vec-from-scratch
-
300D Word2Vec Embeddings for Nepali Language β - Pre-computed 300D vectors, 20k+ words. Ready-to-use. GitHub: rabindralamsal/Word2Vec-Embeddings-for-Nepali-Language
-
Nepali FastText Word Vectors β - Official FastText vectors (Meta/Facebook). Trained on Common Crawl + Wikipedia. fastText: crawl-vectors
Large Language Models & Transformers
-
IRIISNEPAL RoBERTa (110M params) β - 27.5 GB training corpus from 99 news sites. State-of-the-art Nepali BERT-style model. HF: IRIISNEPAL/RoBERTa_Nepali_110M | arXiv: 2411.15734
-
NepaliBERT β - 4.6 GB training corpus, 85k+ articles. Masked language model baseline. HF: Shushant/nepaliBERT
-
DistilGPT2-Nepali β - 13M Nepali text sequences (OSCAR + CC100 + Wikipedia). Text generation model. HF: Sakonii/distilgpt2-nepali
-
Nepali Text Generation (Transformer) β - Custom transformer for generation & spelling correction. GitHub: NirajanBekoju/Transformer-Based-Nepali-Language-Model
-
NepBERTa β - Official Nepali BERT baseline for GLUE benchmark. nepberta.github.io
Lexicons, Linguistics & Resources
Linguistic resources, dictionaries, and instruction-tuned datasets.
Dictionaries & Word Lists
-
Sabdabikash Synonym Word List β - 50,000+ Nepali words with synonyms (thesaurus). Kaggle: thenepaliguy/sabdabikash-synonym-nepali-word-list
-
Nepali Dictionary β - 25,000+ entries with definitions & examples. Kaggle: sangamthapa/nepali-dictionary
-
Nepali Stopwords β - 400+ common words for filtering. Kaggle: sangamthapa/nepali-stopwords
-
Nepali Brihat Sabdakosh JSON β - 122,000 words from comprehensive Nepali dictionary (JSON format). GitHub: bikashpadhikari/nepali-brihat-sabdakosh-json
Morphology & Syntax
-
Nepali POS Data (UPOS Mapped) β - POS tags following Universal Dependencies standard, 3,000+ tagged sentences. Kaggle: thenepaliguy/nepali-pos
-
Nepali Word-Lemma Gold Data β - Manual lemmatization annotations, 5,000+ words. GitHub: dpakpdl/NepaliLemmatizer
-
Universal Dependencies (UD) Nepali β - 17,500+ tokens with full syntactic dependency annotations (official UD project). GitHub: UniversalDependencies/UD_Nepali-NPP
Instruction Tuning & Multilingual
-
Bactrian-X (Instruction Tuning) β - Nepali included in multilingual instruction-tuning dataset (50+ languages). HF: MBZUAI/Bactrian-X
-
Aya Dataset (Instruction Tuning) β - Nepali included in community-driven instruction dataset (101 languages). HF: cohere/aya_dataset
Code-Mixed & Multilingual NLP Datasets
Datasets for code-mixing, cross-lingual learning, and low-resource adaptation.
-
Code-Mixed Nepali-English Abuse Detection β - 5,000 Nepali-English code-mixed comments. Social media. arXiv: 2504.21026 New 2025.
-
Nepali-English Code-Switched LID, POS, NER, Sentiment β - Complete NLP pipeline for code-mixed data. GitHub: sagorbrur/codeswitch
-
CLE Parallel Corpus (AI4Bharat) β - English-Nepali-Urdu parallel data. Multilingual. GitHub: AI4Bharat/indicnlp_catalog
Specialized Collections & Aggregators
One-stop resources for finding related Nepali datasets.
-
Comprehensive Nepali Datasets (IOST-ASCOL) β - Aggregated NLP, speech, image, geospatial datasets. One-stop resource. GitHub: IOST-ASCOL/nepali-datasets
-
Curated Nepali NLP Resources β - Comprehensive resource list with papers & tools. GitHub: ghimiresunil/Curated-List-of-Nepali-NLP-Resources
-
Nepali NLP Resources (rameshhpathak) β - Tool & dataset aggregator with descriptions. GitHub: rameshhpathak/nepali-nlp-resources
-
Nepali NLP Progress β - Research papers & datasets tracker (regularly updated). GitHub: divyamani1/Nepali-NLP-Progress
-
IndicNLP Catalog (AI4Bharat) β - Official Indic language resources (11 languages including Nepali). ai4bharat.github.io/indicnlp_catalog
-
ML Datasets for Nepal β - Curated ML resources including Laxmi Prasad Devkota Poems (119k characters) & Brihat Sabdakosh. GitHub: amitness/ml-datasets
Open Data & Government Resources
Official government datasets and open data portals.
-
Open Data Nepal β - Official open data portal with 500+ government datasets (health, education, infrastructure). opendatanepal.com
-
Census Nepal β - Official census data from Central Bureau of Statistics (demographic, geographic, economic). censusnepal.cbs.gov.np/results
-
Local Government of Nepal - Municipal & district government data (federal structure). Note: Original link insufficient; recommend using Open Data Nepal instead.
Tools & NLP Frameworks
Complete NLP toolkits and utilities for Nepali processing.
-
Nepali Lemmatizer β - Rule-based lemmatization with training data. GitHub: dpakpdl/NepaliLemmatizer
-
Nepali Transliteration β - Script conversion dataset for transliteration tasks. Kaggle: saugatkafley/nepali-transliteration
-
Audinp (Data Collector) β - Tool for collecting speech data (contributed to OpenSLR-54). GitHub: SUBOdhar/audinp
-
BISH-100 (AI Anchor) β - Synthetic video dataset with AI-generated Nepali anchor. Kaggle: bisheshworneupane/bish-100-nepali-text-driven-ai-anchor
-
Fine-tuned DistilBERT on 16 Newsgroup Dataset β - Ready-to-use classifier for news categorization. HF: Suyogyart/nepali-16-newsgroups-classification
Research Papers & Benchmarks
Peer-reviewed publications on Nepali NLP and related work.
Recent & High-Impact (2024-2026)
-
NepaliGPT: A Generative Language Model for the Nepali Language β - Recent LLM research. arXiv: 2506.16399
-
NLUE (Nepali Language Understanding Evaluation) β - 9 NLU tasks with comprehensive benchmark. arXiv: 2411.19244
-
IRIISNEPAL RoBERTa: State-of-the-art Nepali LM β - 27.5 GB training corpus from 99 news sites. arXiv: 2411.15734
-
Code-Mixed Nepali-English Abuse Detection β - 5k annotated code-mixed dataset. arXiv: 2504.21026
-
Nepali Transformers@NLU of Devanagari Script Languages 2025 β - Transformer architectures for Devanagari. ACL: 2025.chipsal-1.36
Sentiment Analysis & Classification
-
Aspect Based Sentiment Analysis of Nepali Text Using SVM and Naive Bayes β - Comparative ML approach. ResearchGate
-
An Analysis of Classification Algorithms for Nepali News β - Benchmark of various classifiers. ResearchGate
-
Nepali Text Document Classification Using Deep Neural Network β - Deep learning approaches. NEPJOL
-
Application of Nepali Large Language Models to Improve Sentiment β - LLM applications. ACM New 2024.
NLP Tasks & Applications
-
A Machine Learning Approach to Anaphora Resolution in Nepali Language β - Pronoun resolution task. IEEE
-
Nepali Image Captioning β - Vision-language multimodal task. IEEE: 8947436
-
Named-Entity Based Sentiment Analysis of Nepali News Media Texts β - NER + sentiment joint modeling. ACL Anthology
-
Topic Modeling for Nepali Political News β - Topic analysis in news domain. IEEE: 11004776 New.
-
NepKanun: A RAG-Based Nepali Legal Assistant β - RAG systems for legal domain. OpenReview New 2025.
-
Exploring NLP Challenges for Nepali β - Overview of remaining challenges. Preprints: 202409.1229 New 2024.
Linguistic & Historical
-
Natural language processing for Nepali text: a review β - Comprehensive NLP review. Springer
-
A Descriptive Grammar of Nepali and an Analyzed Corpus β - Linguistic grammar reference. Google Books
-
Nepali Spell Checker 1.1 and the Thesaurus β - Early spell checking research. Wayback: NEP05.pdf
-
Nepali Spell Checker β - Earlier spell checking work. Wayback: NEP04.pdf
Research Aggregators
-
List of more Nepali NLP papers β - Comprehensive tracker (maintained). GitHub: RayGone/Nepali-NLP-Progress
-
Nepali NLP Progress (divyamani1) β - Community-maintained research tracker. GitHub: divyamani1/Nepali-NLP-Progress
Ethical Considerations
- Sentiment/Hate Speech Data: Contains potentially offensive language; bias mitigation recommended for model training
- Social Media Data (Tweets, Instagram): May contain personal information; use with GDPR/privacy compliance
- Copyright: Wikipedia, news articles sourced responsibly; attribution recommended
- Multilingual Data: Code-mixed datasets reflect real-world language use; social biases may be present
How to Contribute
- Verify Link: Test that dataset is publicly accessible
- Document Metadata: Include: name, size, domain, language(s), annotation scheme
- Format Entry: Follow category structure with title, description, link
- Submit PR: To pemagrg1/Nepali-Datasets
Additional Resources
- IndicNLP Catalog (AI4Bharat): ai4bharat.github.io - Comprehensive Indic language resources
- Hugging Face Nepali Datasets: huggingface.co - Growing collection of Nepali datasets
- GitHub Nepali NLP: github.com/search?q=nepali+nlp - Discover new projects and datasets
- ACL Anthology (Nepali Papers): aclanthology.org - Academic papers on Nepali NLP
- arXiv (Nepali Research): arxiv.org - Preprints and recent research