17 results for “topic:low-resource-machine-translation”
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Curated list of publicly available parallel corpus for Indian Languages
Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
A 2024 Reading List for Bilingual Lexicon Induction (BLI) / Word Translation. Frequently Updated.
Code for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
[ACL 2021, Findings] Cognate Prediction Per Machine Translation
Code for the EMNLP 2021 Paper "AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages" by Machel Reid, Junjie Hu, Graham Neubig, Yutaka Matsuo
Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.
This repository is an open-source colleciton of various low-resource machine translation experiments.
LoResMT@ACL 2024: Learning-From-Mistakes Prompting for Indigenous Language Translation – A feedback-driven approach to enhance low-resource translation.
Self-Augmented In-Context Learning for Unsupervised Word Translation (ACL 2024). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Low-Resource OCR
Learning from Wrong Predictions in Low-Resource Neural Machine Translation. Basic implementation of the USKI (Unaligned Sentences Keytokens pre-training) method for Neural Machine Translation
Results and code for the paper "Efficient Architetures for Low-resource Machine Translation" (Workshop on Advancing NLP for Low-Resource Languages at RANLP 2025 (Varna, Bulgaria), Sep 13)
On-develop Bitext Mining Tool for low resource languages