"topic:bertscore" — Search

The work presented was developed during the internship, as researchers in the field of Natural Language Generation, at the Insid&s Lab laboratory in Milan-Bicocca. The work carried out deals with the creation of a framework for the correct assessment of the impact of the quality of the input datasets on the quality of the text generated by the NLG models, specifically: Creation of the "Concept-Based" and "Entity-Based" versions of the WebNLG dataset; Evaluation of the quality of the datasets created; Training of LSTM and Transformer models using the OpenNMT tool; Natural language text generation by LSTM and Transformer models; Evaluation of the quality of the text generated by the NLG models; Final analysis.

Jupyter Notebook10Updated 2 months ago

artificial-intelligencebertscorebleu-scoredeep-learninglstmnatural-language-generationnatural-language-processingnlgpythonrdf-triplesrouge-metrictorchtransformer-architecture

liux2/BERT_score_T5

Experimenting changing loss function in T5 to BERTScore

Jupyter Notebook10Updated 2 years ago

bertbertscoret5webnlg

soniatyburczy/llama2-qlora-sft-coverletter-project

Implementation of a task-specific QLoRA supervised fine-tuning pipeline for LLaMA-2-7B-Chat, developed for an independent study on structured cover letter generation.

Python10Updated 2 months ago

bertscorefine-tuninghuggingfacellama2llmloramodel-evaluationnatural-language-processingnlpparameter-efficient-fine-tuningpeftpytorchqlorarougestructured-generationsupervised-finetuningtext-generationtransformers

TSS-sniper/Research-paper-Summarizer-with-Realtime-Eval

An LLM-powered application that summarizes scientific research papers, extracts tables, and provides real-time evaluation using BERTScore (F1) and ROUGE. Built using Meta’s LLaMA 3–8B via Groq, with table extraction powered by pdfplumber and pandas.

Python10Updated 7 months ago

bertscorellmmeta-llama3research-paperrouge-metrictext-summarization

Joe-Naz01/llm_evaluate

Lightweight evaluation framework for large language model outputs using BLEU, ROUGE, and BERTScore. Loads reference and generated texts, computes similarity metrics to understand how different metrics are used

Jupyter Notebook00Updated 4 months ago

artificial-intelligencebertscorebleu-scoredata-scienceevaluation-metricsmachine-learningrouge-metric

rmaacario/LLMs-vs.NMT-spatial-semantics-translation

Code and data from the master’s thesis “Decoding Spatial Semantics”. Analyzes and compares open-source LLMs and NMT systems in translating spatial prepositions from English to Brazilian Portuguese. Includes preprocessing scripts, datasets, and evaluation metrics.

Jupyter Notebook00Updated 1 year ago

bertscorebleu-scorecometdeep-learningmachine-translationnlppythontransformers

A-

a-iceberg/clustering_and_naming_categories

Summarization, clastering and characterization of text categories using LLM

Jupyter Notebook00Updated 1 year ago

bertscoreclusteringdata-analysisdata-sciencedeep-learninggptllmmssqlservernlpopenaiprompt-engineeringpythonsummarizationtransformers

Anushkaghei/Hallucination-Detection-In-LLMs

Detecting and Mitigating Self Contradictory Hallucinations in LLMs using a Multi-Agent System and Stepback Prompting

Jupyter Notebook00Updated 1 year ago

bertscorebleu-scoredetectionllmsmitigationmulti-agent-systemsrougestepback-prompting

dvd125/NLG-The-impact-of-data-quality-on-automatic-text-generation-from-RDF-data

No description provided.

Jupyter Notebook00Updated 3 years ago

bertscorebleu-scoredeep-learninglstmnatural-language-generationnatural-language-processingpythonrouge-metrictorchtransformer

luizanisio/agent-orchestration-2026

Agent Orchestration - LLM for Legal Metadata Extraction: A Comparative Analysis of Efficiency and Precision (paper 161 PROPOR)

Python00Updated 5 days ago

agent-orchestrationbertscoredata-extractiondata-sciencellmpythonrouge-metricslm

wheevu/naver-lens

Embedded AI chat widget for the Naver Smart Store UI. Uses product review data for comprehensive summary. Built for the NAVER Vietnam AI Hackathon.

TypeScript00Updated 1 week ago

bertscoredockergenai-applicationhackathon-projectmodel-evaluationrag

choeyunbeom/arxiv_rag_system

RAG system for querying arXiv papers hybrid retrieval, LoRA fine-tuning, BERTScore evaluation, FastAPI + Streamlit

Python00Updated 6 days ago

bertscorechromadbdockerfastapifine-tuningllmloramachine-learningnlpollamaportfolio-projectpythonragstreamlit

mmichall/bleu-macaw

BleuMacaw: GPT-2 and SentenceTransformers for Paraphrases Generation

Python00Updated 2 months ago

bertscorebleubleu-scoregpt-2huggingfacehuggingface-transformersparaphrase-generationparaphrasingparrotsentence-transformers

hordiales/llm-rag-assistant-localhost

Local chatbot (no API) designed to answer questions in Spanish using your own Q&A dataset. Model evaluation with Bertscore

Python00Updated 5 months ago

bertscorellmragrag-chatbot

vihanga/prompt-sandbox

Testing framework for LLM prompts. Started as a weekend project after getting tired of manually testing prompts in ChatGPT. Async experiment runner with BLEU/ROUGE/BERTScore metrics.

Python00Updated 3 months ago

benchmarkingbertscorebleuchatgptevaluation-metricsllmprompt-engineeringprompt-testingrouge

caiocezarq/llm-comparison-benchmark

Framework modular em Python para benchmarking e análise reprodutível de LLMs, com execução via APIs, coleta estruturada de respostas, métricas automáticas (BLEU, ROUGE, BERTScore, MMLU, HellaSwag), rankings e relatórios consolidados.

HTML00Updated 6 days ago

aibertscorebleu-scoreevidently-aihellaswagllmllms-benchmarkingmmlupythonrougerouge-metric

greenirvavril/lux-eval

A local evaluation suite for Luxembourgish machine translation.

Python00Updated 3 weeks ago

bertscorebleubleu-metricbleu-scorebleurtbleurt-20chrfchrf-scoreluxembourgluxembourgishmachine-translationmachine-translation-evaluationmachine-translation-metricsmtsacrebleu-evaluation