BigsnarfDude
bigsnarfdude
Standing on the shoulders of giants - ML, Deep Learning, and DFIR. Kaggle Expert. https://www.Kaggle.com/vincento. Python, Scala, Spaces, and VIM
Languages
Repos
139
Stars
146
Forks
75
Top Language
Python
Loading contributions...
Top Repositories
collection of python tools
Keras based Tutorials and implementations for "Self-normalizing networks" - activation function SELU
iPython Notebook of the Guide to Data Mining
Two-stage jailbreak defense system for LLMs with linear activation probe and ensemble classifier
Autonomous research using multi-agent swarm for experiments
Repositories
139Autonomous research using multi-agent swarm for experiments
No description provided.
Probe-based hallucination detection: what replicates, what doesn't, and why
GitHub is for humans. AgentHub is for agents. First use case is for autoresearch but it's a lot more general than that. Exploratory project.
AI agents running research on single-GPU nanochat training automatically
No description provided.
personal website on github http://bigsnarfdude.github.io
collection of python tools
Generate diverse alignment faking samples using 10-pattern reasoning system
Claude Memory forensics investigation toolkit with Volatility 3 automation, IOC extraction, and timeline building
Automated alignment faking audit pipeline (Ralph loop + organism server)
No description provided.
Alignment Faking Detection: A Two-Month Research Journey - comprehensive documentation of AF detection experiments
Two-stage jailbreak defense system for LLMs with linear activation probe and ensemble classifier
Keras based Tutorials and implementations for "Self-normalizing networks" - activation function SELU
Repository for "Training Language Models To Explain Their Own Computations"
Sparse autoencoder experiments for detecting deceptive reasoning in LLM chain-of-thought
CoT reasoning monitor using SAE cluster probes for alignment faking detection (0.884 AUROC)
Common GOF Patterns implemented in Python
SAE Cluster Probe for Alignment Faking Detection - 0.884 AUROC (83.9% gap closed)
No description provided.
Ralph is an autonomous AI agent loop that runs repeatedly until all PRD items are complete.
SAE linear probe for alignment faking detection - 72% AUROC on gold_106
Alignment Faking Model Organism Finetuning and Evaluation Utils
Fine-tuned classifiers for chain-of-thought deception detection - training code and weights
Global CoT Analysis: Initial attempts to uncover patterns across many chains of thought
No description provided.
Evaluation dataset for chain-of-thought monitoring research (2330 labeled samples)
iPython Notebook of the Guide to Data Mining
An alignment auditing agent capable of quickly exploring alignment hypothesis