"topic:llm-test" — Search

14 results for “topic:llm-test”

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

Python2.3k202Updated 1 day ago

autoevaluationevaluationexperimentationhallucination-detectionjailbreak-detectionllm-evalllm-promptingllm-testllmopsmachine-learningmonitoringopenai-evalsprompt-engineeringroot-cause-analysis

georgian-io/LLM-Finetuning-Toolkit

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

Python870105Updated 1 week ago

ablation-studyclassificationfalconfine-tuningfinetuningflan-t5large-language-modelsllama2llm-testloramistral-7bnlpnlp-machine-learningqloraredpajamasummarizationunit-testingzephyr

PacificAI/langtest

Deliver safe & effective language models

Python55250Updated 2 hours ago

ai-safetyai-testingartificial-intelligencebenchmark-frameworkbenchmarksethics-in-ailarge-language-modelsllmllm-as-evaluatorllm-evaluation-toolkitllm-testllm-testingml-safetyml-testingmlopsmodel-assessmentnlpresponsible-aitrustworthy-ai

athina-ai/athina-sdk

LLM Testing SDK that helps you write and run tests to monitor your LLM app in production

Python1321Updated 1 month ago

aiopsllmllm-testllmopsmlopsmonitoringtesting-tools

qjr87/llm-api-test

A tool for testing and comparing the performance of different Large Language Model APIs. 一个用于测试和比较不同大语言模型API性能的工具。

HTML412Updated 3 days ago

api-performance-testingllm-api-performancellm-testllm-testing

levitation-opensource/Manipulative-Expression-Recognition

MER is a software that identifies and highlights manipulative communication in text from human conversations and AI-generated responses. MER benchmarks language models for manipulative expressions, fostering development of transparency and safety in AI. It also supports manipulation victims by detecting manipulative patterns in human communication.

HTML143Updated 1 month ago