26 results for “topic:bertscore”
CodeBERTScore: an automatic metric for code generation, based on BERTScore
EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation
LLM Evaluation and Observability System for Football Content
MAchine Translation Evaluation Online (MATEO)
ViAG: A Novel Framework for Fine-tuning Answer Generation models ultilizing Encoder-Decoder and Decoder-only Transformers's architecture
Fine-tuning GPT-3.5 and Llama3 LLMs for enhanced persona consistency in chatbots using Google's Synthetic Persona Chat dataset
AUTOMATIC ROMANIAN TEXT GENERATION USING GPT-2
Medical Question Answering System using on PubMed dataset.
About BertScore
The work presented was developed during the internship, as researchers in the field of Natural Language Generation, at the Insid&s Lab laboratory in Milan-Bicocca. The work carried out deals with the creation of a framework for the correct assessment of the impact of the quality of the input datasets on the quality of the text generated by the NLG models, specifically: Creation of the "Concept-Based" and "Entity-Based" versions of the WebNLG dataset; Evaluation of the quality of the datasets created; Training of LSTM and Transformer models using the OpenNMT tool; Natural language text generation by LSTM and Transformer models; Evaluation of the quality of the text generated by the NLG models; Final analysis.
Experimenting changing loss function in T5 to BERTScore
Implementation of a task-specific QLoRA supervised fine-tuning pipeline for LLaMA-2-7B-Chat, developed for an independent study on structured cover letter generation.
An LLM-powered application that summarizes scientific research papers, extracts tables, and provides real-time evaluation using BERTScore (F1) and ROUGE. Built using Meta’s LLaMA 3–8B via Groq, with table extraction powered by pdfplumber and pandas.
Lightweight evaluation framework for large language model outputs using BLEU, ROUGE, and BERTScore. Loads reference and generated texts, computes similarity metrics to understand how different metrics are used
Code and data from the master’s thesis “Decoding Spatial Semantics”. Analyzes and compares open-source LLMs and NMT systems in translating spatial prepositions from English to Brazilian Portuguese. Includes preprocessing scripts, datasets, and evaluation metrics.
Summarization, clastering and characterization of text categories using LLM
Detecting and Mitigating Self Contradictory Hallucinations in LLMs using a Multi-Agent System and Stepback Prompting
No description provided.
Agent Orchestration - LLM for Legal Metadata Extraction: A Comparative Analysis of Efficiency and Precision (paper 161 PROPOR)
Embedded AI chat widget for the Naver Smart Store UI. Uses product review data for comprehensive summary. Built for the NAVER Vietnam AI Hackathon.
RAG system for querying arXiv papers hybrid retrieval, LoRA fine-tuning, BERTScore evaluation, FastAPI + Streamlit
BleuMacaw: GPT-2 and SentenceTransformers for Paraphrases Generation
Local chatbot (no API) designed to answer questions in Spanish using your own Q&A dataset. Model evaluation with Bertscore
Testing framework for LLM prompts. Started as a weekend project after getting tired of manually testing prompts in ChatGPT. Async experiment runner with BLEU/ROUGE/BERTScore metrics.
Framework modular em Python para benchmarking e análise reprodutível de LLMs, com execução via APIs, coleta estruturada de respostas, métricas automáticas (BLEU, ROUGE, BERTScore, MMLU, HellaSwag), rankings e relatórios consolidados.
A local evaluation suite for Luxembourgish machine translation.