20 results for “topic:self-consistency”
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning
The official PyTorch implementation for the Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Package for solving generalized BdG mean field theory of interacting systems.
KG-RAG + ToT + multi-agent LLMs for evidence-grounded QA with Neo4j and fine-tuning; reproducible medical case study & evaluation.
Perl implementation of Markov Chain for the course BIO331
Fixed Point solver for generic functions
GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.
Subject of Electronic structure for my master's degree
An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.
Advanced prompt engineering techniques: Chain-of-Thought, Tree-of-Thoughts, ReAct, Self-Consistency
Self consistent model based filter design for 3-phase PLLs.
Evaluation framework for self-hosted LLMs. Systematic prompt ablation (baseline, CoT, few-shot, self-consistency voting) on Llama 3.1 8B via lm-evaluation-harness, with Wilson CI statistical analysis, determinism validation, and load testing under concurrency. Found chain-of-thought degrades accuracy 25pp at small scale.
A consistency-based firewall for high-stakes Retrieval Augmented Generation (RAG). Queries the model multiple times and incinerates the output if entropy is high (divergent answers), preferring silence over hallucination.
10 stochastic parrots are better than 1 🦜
Developing an autonomous system for prompt selection for Large Language Models (LLMs), enhancing performance across tasks by balancing generality and specificity. This project automates diverse, high-quality prompt creation and selection, reducing manual intervention and maximizing LLM utility across applications.
Interactive Streamlit application that benchmarks direct prompting, chain-of-thought, self-consistency, tree-of-thought and reflexion techniques across OpenAI GPT-3.5 and Groq Gemma-9B-IT.
KERNEL v4.2 est un scheduler cognitif adaptatif pour LLMs intégrant LATS, Reflexion et Meta-Prompting. Il optimise dynamiquement le routing du raisonnement selon la complexité (1-5) pour maximiser la précision et réduire les hallucinations. Propulsé par la vision Lichen-Collectives.
Demos and walkthroughs of published Machine Learning research papers