28 results for “topic:decoder-only”
[ICCV 2025] DONUT: A Decoder-Only Model for Trajectory Prediction
Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and vision-language capabilities
使用Decoder-only的Transformer进行时序预测,包含SwiGLU和RoPE(Rotary Positional Embedding),Time series prediction using Decoder-only Transformer, Including SwiGLU and RoPE(Rotary Positional Embedding)
Code for paper "Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI"
🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
Minimal decoder-only seq2seq pipeline with proper causal masking, teacher forcing, Ignite training loop, and checkpointed inference
SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models
ViAG: A Novel Framework for Fine-tuning Answer Generation models ultilizing Encoder-Decoder and Decoder-only Transformers's architecture
A from-scratch implementation of a scaled-down GPT-2 model in PyTorch, trained on the Snappfood dataset for sentiment-controlled Persian text generation.
No description provided.
Developed and pre-trained a 20.39M-parameter Punjabi GPT-style base model from scratch, including corpus preparation, tokenizer training, benchmark evaluation, and text generation, using a cleaned Punjabi corpus and local Apple Silicon GPU acceleration.
Clean-room GPT-2/GPT-3 implementation: tokenizers, architecture blocks, training loop with AdamW + cosine decay, CLI scripts, inference tools, and pytest suite. Covers OpenWebText-10k & WikiText-103 workflows. Designed as an academic reference for understanding and scaling decoder-only transformers
中文至英文序列转导模型。从零严格复现《Attention Is All You Need》在中文→英文机器翻译(Zh→En)上的完整流程。
This is a compilation about excercises to learn how to implement a transformer model
A mini version of GPT implemented on shakespear using BPE
Decoder-only transformer, simplest character-level tokenization, training and text generation.
Auto regressive text generation application using decoder transformer
A decoder only approach for image reconstruction inspired by adversarial machine learning implemented in keras/tensorflow2
A compact, readable GPT-style decoder-only Transformer implemented in pure PyTorch. The goal is to expose the essential architectural pieces with minimal scaffolding so you can train and tinker quickly.
Implementation of the GPT-2 architecture using PyTorch, trained on the TinyStories dataset. Features custom training pipelines on Modal (cloud computing) and integration with the Hugging Face ecosystem.
This project is my PyTorch reproduction of PaliGemma, a compact 3B vision–language model that integrates SigLIP vision features with a Gemma decoder. I implemented the full multimodal pipeline from vision encoding to autoregressive text generation to study modern VLM architectures from a research perspective.
🧸 A fully custom GPT-style language model built from scratch using PyTorch and trained on Winnie-the-Pooh! Explored the core mechanics of self-attention, autoregressive text generation, and modular model training, all without relying on any external libraries.
This study examines the effectiveness of transformer-based models for financial time series forecasting, specifically focusing on log returns derived from daily closing prices of the DAX40 index. We propose a decoder-only transformer model designed for immediate-term financial time series forecasting: The PatternDecoder.
This repository contains the implementation and experiments for comparing gradual growth methods, specifically the G_stack approach, with naive models trained from scratch. The project focuses on addressing catastrophic forgetting and improving model performance in continuous learning scenarios.
Decoder-only language model trained to generate texts that look like Shakespear's plays
Decoder-only transfomer model for answering short questions using causal self-attention.
Implement a decoder-only Transformer in PyTorch to reverse character sequences using causal masking and cross-entropy loss with Ignite training support
No description provided.