SrEntropy/nanoGPT-Transformer
Mastering every concept from the seminal 2017 paper "Attention Is All You Need." This project is to internalize why each component exist by deriving the mathematics by hand.
Motivation: Toward a Spiking Transformer Architecture
- Read paper: "Attention Is All You Need"
Understanding the fundamentals of LLMs
-
Recent advances in artificial intelligence, particularly Large Language Models (LLMs), are largely driven by architectural innovations such as the Transformer introduced in 2017. While these systems demonstrate impressive empirical performance, their widespread adoption has often outpaced a rigorous understanding of the mathematical and computational principles that govern their behavior.
-
This project is motivated by the need to study modern neural architectures from first principles. Rather than treating contemporary models as black boxes, the goal is to examine how mathematical concepts such as linear algebra, optimization, probability, information theory, and dynamical systems shape model design, learning dynamics, and limitations.
In an academic landscape increasingly influenced by rapid iteration and abstraction, this repository emphasizes foundational understanding as a prerequisite for innovation. By reconstructing and analyzing core components of state-of-the-art architectures, this project aims to bridge the gap between theory and practice.
Objectives
-
Analyze Transformer Architectures from First Principles:
Study how attention mechanisms address the limitations of sequential models (e.g., RNNs and LSTMs) by enabling parallel computation, improved gradient flow, and long-range dependency modeling. -
Investigate Spiking Neural Computation:
Explore how spiking neuron models and event-based computation can be integrated with Transformer-like architectures to improve biological plausibility, energy efficiency, and temporal representation. -
Examine Failure Modes of Modern Models:
Analyze known weaknesses of LLMs, including hallucination, brittleness under distribution shift, and limitations in reasoning, through a theoretical and architectural lens. -
Develop a Conceptual Path Toward a Spiking LLM:
Establish the mathematical and architectural groundwork required for constructing a spiking-based large language model, informed by both deep learning theory and computational neuroscience. -
Academic Preparation:
Serve as a rigorous foundation for graduate-level study in Applied Mathematics, Machine Learning, and Artificial Intelligence, with an emphasis on theory-driven system design.
Scope and Philosophy
- This repository prioritizes understanding over performance. Implementations are intentionally minimal, interpretable, and theory-driven, focusing on: Explicit derivations and design rationale, architectural comparisons grounded in mathematics, and clear links between theory, implementation, and observed behavior
The ultimate objective is not to reproduce state-of-the-art benchmarks, but to cultivate the conceptual tools necessary to design and reason about future architectures.