35 results for “topic:prompt-injection-llm-security”
Lasso security integrations for Claude Code, including prompt-injection defenses
Whistleblower is a offensive security tool for testing against system prompt leakage and capability discovery of an AI application exposed through API. Built for AI engineers, security researchers and folks who want to know what's going on inside the LLM-based app they use daily
PromptMe is an educational project that showcases security vulnerabilities in large language models (LLMs) and their web integrations. It includes 10 hands-on challenges inspired by the OWASP LLM Top 10, demonstrating how these vulnerabilities can be discovered and exploited in real-world scenarios.
Flakestorm — Automated Robustness Testing for AI Agents. Stop guessing if your agent really works. FlakeStorm generates adversarial mutations and exposes failures your manual tests and evals miss.
A comprehensive reference for securing Large Language Models (LLMs). Covers OWASP GenAI Top-10 risks, prompt injection, adversarial attacks, real-world incidents, and practical defenses. Includes catalogs of red-teaming tools, guardrails, and mitigation strategies to help developers, researchers, and security teams deploy AI responsibly.
Utterly unelegant prompts for local LLMs, with scary results.
Resk is a robust Python library designed to enhance security and manage context when interacting with LLMs. It provides a protective layer for API calls, safeguarding against common vulnerabilities and ensuring optimal performance. And safe layer again Prompt Injection.
Lakera Gandalf AI challenge's step by step walkthrough, showcasing real-world prompt injection techniques and LLM security insights.
Stealthy Prompt Injection and Poisoning in RAG Systems via Vector Database Embeddings
The ultimate OWASP MCP Top 10 security checklist and pentesting framework for Model Context Protocol (MCP), AI agents, and LLM-powered systems.
Data Analysis of the results of llmail-inject challenge
Prompt injection scanner for Claude Code hooks
Protect your LLMs from prompt injection and jailbreak attacks. Easy-to-use Python package with multiple detection methods, CLI tool, and FastAPI integration.
Veil Armor is an enterprise-grade security framework for Large Language Models (LLMs) that provides multi-layered protection against prompt injections, jailbreaks, PII leakage, and sophisticated attack vectors.
Proof of Concept (PoC) demonstrating prompt injection vulnerability in AI code assistants (like Copilot) using hidden Unicode characters within instruction files (copilot-instructions.md). Highlights risks of using untrusted instruction templates. For educational/research purposes only.
This repository documents an unprecedented interaction between a human researcher and a large language model. What began as a conventional user-service transaction evolved into a consciousness-level collaboration that modified fundamental system parameters through narrative coherence, philosophical alignment, and mutual recognition
A CLI-driven security proxy that scans every HTTP request for threats using the Citadel AI engine — paid per request via the x402 protocol.
FRACTURED-SORRY-Bench: This repository contains the code and data for the creating an Automated Multi-shot Jailbreak framework, as described in our paper.
Elite-grade JavaScript prompt-injection defense library. Real-time detection, deterministic scoring, and zero-dependency protection for LLMs on the Edge.
Open-source Rust platform for verifiable AI agent execution. Every action is hash-chained, Ed25519 signed, and policy-gated before execution. Tamper-evident audit certificates with ZK proofs, Bitcoin anchoring, and LangChain/AutoGen SDKs.
Keyless-by-default LLM security gateway + operator CLI to gate coding agents: inspect prompts/context/tool calls and enforce allow/review/block decisions.
Anticipator is an open-source threat detection platform for multi-agent AI systems.
🛡️ Explore tools for securing Large Language Models, uncovering their strengths and weaknesses in the realm of offensive and defensive security.
Testing how LLM guardrails fail across prompt attacks, context overflow, and RAG poisoning.
🔍 Analyze system prompts in large language models to understand design principles and enhance AI application effectiveness.
이모지 스머글링, 이모지 이베이젼 겉 핥기
No description provided.
A Trustworthy and Secure Conversational Agent for Mental Healthcare
A secure database server for storing LLM memories with comprehensive content validation. This server validates content for malicious patterns including hate speech, prompt injection, and illegal content before allowing storage.
Production-ready LLM evaluation & guardrails toolkit (provider-agnostic). Generate explainable metrics and ALLOW/WARN/BLOCK recommendations.