GitHunt
RM

rmaacario/LLMs-vs.NMT-spatial-semantics-translation

Code and data from the master’s thesis “Decoding Spatial Semantics”. Analyzes and compares open-source LLMs and NMT systems in translating spatial prepositions from English to Brazilian Portuguese. Includes preprocessing scripts, datasets, and evaluation metrics.

Decoding Spatial Semantics

This repository contains code and resources from the master’s thesis “Decoding Spatial Semantics: A Comparative Analysis of the Performance of Open-source LLMs against NMT Systems in Translating EN-PT-br”.

Overview

This study explores the challenges of translating spatial language using open-source Large Language Models (LLMs) and traditional Neural Machine Translation (NMT) systems. It focuses on translating spatial prepositions such as ACROSS, INTO, ONTO, and THROUGH from English to Portuguese (PT-br).

Contents

  • Code: Includes scripts for data preprocessing, running experiments, and evaluating results.
  • Datasets: Bilingual dataset of TED Talks subtitles focusing on spatial prepositions.
  • Evaluation Metrics: Scripts for computing BLEU, METEOR, BERTScore, COMET, and TER.

Languages

Jupyter Notebook54.4%Python45.2%Batchfile0.1%Makefile0.1%CSS0.1%

Contributors

Created August 30, 2024
Updated September 15, 2024