119 results for “topic:multilingual-nlp”
MTEB: Massive Text Embedding Benchmark
Crosslingual Generalization through Multitask Finetuning
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
[EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
This repo supports various cross-lingual transfer learning & multilingual NLP models.
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.
WorkRB: Work Research Benchmark
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Official codebase for the ACL 2025 Findings paper: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval.
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [arXiv 2025]
Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis
This repository provides the official resources for EMNLP 2025 Paper Grounding Multilingual Multimodal LLMs With Cultural Knowledge
🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Cross Lingual Language models for making search engines for Holy Quran and Sahih Hadiths
Codebase for CLINIC, a multilingual trustworthiness benchmark for Healthcare
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
[EMNLP 2022] Discovering Language-neutral Sub-networks in Multilingual Language Models.
Winning Solution for the Bangla Complex Named Entity Recognition Challenge - BDOSN NLP Hackathon 2023
Official repository of FEVER@ACL 2025 paper "When Scale Meets Diversity: Evaluating Language Models on Fine-Grained Multilingual Claim Verification"
ConLID: Supervised Contrastive Learning for Low-Resource Language Identification [EACL - 2026]
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
MaLA-500: Massive Language Adaptation of Large Language Models
Chaii (Challenge in AI for India) Multilingual QnA - Google Research India
Mechanistic interpretability of cross-lingual concept representations in Tiny Aya — rise, peak, collapse.
AQUILIGN is a multilingual alignment and collation tool for 📜 medieval texts. It uses ✂️ clause-level segmentation and 🔗 contextual alignment based on BERT models, with applications in 🌍 historical linguistics, 📖 philology, and 🤖 premodern NLP.