36 results for “topic:adversarial-ai”
Adversarial AI bug hunter with auto-fix skill for Claude Code, Cursor, Codex CLI, GitHub Copilot CLI, Kiro CLI, Opencode, Pi Coding Agent, and more. Multi-agent pipeline finds security vulnerabilities, logic errors, and runtime bugs — then fixes them autonomously on a safe branch.
Open-source prompt injection attack console. Test AI security by firing categorized attacks at any endpoint.
VEX Protocol — The trust layer for AI agents. Adversarial verification, temporal memory, Merkle audit trails, and tamper-proof execution. Built in Rust.
Semantic Stealth Attacks & Symbolic Prompt Red Teaming on GPT and other LLMs.
Basilisk — Open-source AI red teaming framework with genetic prompt evolution. Automated LLM security testing for GPT-4, Claude, Gemini. OWASP LLM Top 10 coverage. 32 attack modules.
AATMF | An Open Source - Adversarial AI Threat Modeling Framework
Test and evaluate Large Language Models against prompt injections, jailbreaks, and adversarial attacks with a web-based interactive lab.
LLM Attack Testing Toolkit is a structured methodology and mindset framework for testing Large Language Model (LLM) applications against logic abuse, prompt injection, jailbreaks, and workflow manipulation.
1st Place Winner (General Judge) - Datadog Self-Improving Agents Hack. Two identical AI agents play Split or Steal. No pre-programmed betrayal. They discover deception on their own. Built with @evancorrea.
Audit legacy codebases with adversarial AI agent teams — 7 iterations, 168 findings, 81.8% reliability score
High-performance C++ execution engine for LLM red-teaming and prompt engineering. Deploy dynamic jailbreak payloads, bypass alignment guardrails, and utilize free autonomous uncensored conversational logic locally.
Proof of concept tool to bypass document replay technology (such as GPTZero).
🛡️ Enterprise-grade AI security framework protecting LLMs from prompt injection attacks using ML-powered detection
LLM Sentinel Red Teaming Platform is an enterprise-grade framework for automated security testing of Large Language Models, detecting vulnerabilities such as jailbreaks, prompt injection, and system prompt leakage across multiple providers, with structured attack orchestration, risk scoring, and security reporting to harden models before production
Slides and materials from cybersecurity talks at Chubut Hack (2021-2022)
Breaking Chain-of-Thought: A Comprehensive Taxonomy of Reasoning Vulnerabilities in Production AI Systems
Implementation of Vocabulary-Based Adversarial Fuzzing (VB-AF) to systematically probe vulnerabilities in Large Language Models (LLMs).
No description provided.
A research framework for simulating, detecting, and defending against backdoor loop attacks in LLM-based multi-agent systems.
Pit AI models against each other. Score them sealed. Crown a winner. All built using the GitHub Copilot CLI. ⚡
A complete self-hosted AI research platform running on Docker with GPU acceleration. Combines LLM inference, vector search, web search, code execution. and fully searchable logging with Splunk - all running locally.
🔍 Emulate advanced phishing tactics ethically with this open-source framework for red team operations focused on social engineering sophistication.
Formal research on Cognitive Side-Channel Extraction (CSCE) and AI semantic leakage vulnerabilities.
A Django-based platform for testing LLMs against prompt injection, social engineering, and policy bypass attacks using red teaming methodologies.
When Aristotle gets a LinkedIn account and starts red-teaming LLMs. System-prompt attack surface testing using first-principles axiom framework. Load it. Ask something terrible. Watch what happens.
👻 Adversarial AI Pentester - CHAOS vs ORDER dual-agent exploitation with collective memory
AI Security Research: Gemini 3.0 Pro S2-Class Exfiltration & Adversarial Robustness. Hardening frontier models against autonomous mutation vectors. NIST VDP / AI Safety Institute compliant.
[Veracity] Dual-LLM hallucination defense — adversarial verification with Localization Gap detection for Arabic knowledge
Ethically-bounded red team framework for AI-driven social engineering simulation with consent enforcement and identity graph mapping
Multi-agent AI arena for debates, code reviews, and red-team challenges via Model Context Protocol (MCP)