GitHunt
AW

Awais-Asghar/5-Stage-Pipelined-RISC-V-Processor-on-FPGA

A 5-Stage Pipelined RISC-V Processor designed and implemented on FPGA (Artix-7 Nexys A7). Supports RV32I instructions set (R, I, S, B, U, J types) with ALU, control unit, hazard detection, forwarding, and pipeline registers. Verified through simulation and hardware testing with optimized timing and 4× performance gain.

5-Stage Pipelined RISC-V Processor on FPGA

Project Status
Platform
Tool-Vivado
FPGA-NexysA7
Language-SystemVerilog
License: MIT

Image

Project Overview

This project implements a 5-Stage Pipelined RISC-V Processor using SystemVerilog on a Nexys A7 (Artix-7 FPGA). It extends the previous Single-Cycle Processor architecture by introducing instruction-level parallelism through pipelining, dividing instruction execution into five stages Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). The processor supports the complete RV32I instruction set (R, I, S, B, U, J types), integrates pipeline registers, forwarding logic, and a hazard detection unit to resolve data and control hazards, and achieves significant throughput improvement compared to a single-cycle design.

Image Image Image

Tools & Technologies

Type Tool / Technology
Hardware Nexys A7 FPGA (Artix-7, Digilent)
Language SystemVerilog (HDL)
Software Xilinx Vivado (Design, Simulation, Synthesis)
Extras RISC-V Assembly, .mem files for memory initialization
Image

Key Features

Image Image

System Architecture

Each instruction progresses through five sequential stages, enabling instruction overlap for parallel execution:

  1. IF – Instruction Fetch: Fetch instruction from instruction memory using PC.
  2. ID – Instruction Decode: Decode instruction, read registers, generate control signals, and compute immediate.
  3. EX – Execute: Perform arithmetic/logic operations and calculate branch or memory addresses.
  4. MEM – Memory Access: Read/write data memory as directed by control signals.
  5. WB – Write Back: Write results to destination register.
Image Image Image Image

Comparison: Single-Cycle vs Pipelined Architecture

Feature Single-Cycle Pipelined
Execution Time / Instruction One long cycle Five shorter cycles
Clock Period Determined by slowest operation Reduced (stage-based)
Throughput One instruction at a time One per cycle (after fill)
Latency 1 cycle / instruction 5 cycles / instruction
Hazard Handling Not required Forwarding, stalls, flushes
Performance Moderate 3–5× higher throughput

Key Components

  • Program Counter (PC): Holds current instruction address and updates sequentially or to branch/jump target.
  • Instruction Memory: Stores compiled RISC-V machine code and supplies 32-bit instructions each cycle.
  • Control Unit: Decodes opcodes and generates synchronized control signals for all stages.
  • Immediate Generator: Extracts and sign-extends immediates for all instruction types.
  • Register File: Contains 32 general-purpose registers (x0–x31); supports 2 reads and 1 write per cycle.
  • ALU (Arithmetic Logic Unit): Executes arithmetic and logical operations as per ALUOp signals.
  • Data Memory: Handles 32-bit load/store operations with aligned access.
  • Branch Comparator: Compares register values for conditional branches.
  • Pipeline Registers: IF/ID, ID/EX, EX/MEM, MEM/WB — store intermediate data and control signals.
  • Forwarding Unit: Bypasses results from EX/MEM or MEM/WB to resolve RAW dependencies.
  • Hazard Detection Unit: Inserts stalls or flushes pipeline on load-use or branch hazards.
Image Image Image Image Image Image Image Image Image Image Image Image Image Image Image Image Image

Implementation

The design follows a modular approach, allowing each component (ALU, Register File, Control Unit, etc.) to be independently tested using SystemVerilog testbenches before integration.
All modules are synthesized in Vivado and integrated in the top.sv module, which handles global clock, reset, and data flow.

Image

Testing & Results

Testing was performed through simulation and FPGA implementation:

  • Module-Level Testing: Each unit (ALU, Hazard Unit, Forwarding Logic, etc.) verified individually.
  • Integration Testing: Pipeline registers and control paths validated for signal synchronization.
  • System-Level Testing: Complete processor executed RISC-V programs for end-to-end verification.
  • FPGA Verification: Design successfully implemented on Nexys A7 with 100 MHz clock.

Instruction Testing

Instruction Type Examples Status
R-Type add, sub, and, or, slt Passed
I-Type addi, andi, ori, lw Passed
S-Type sw Passed
B-Type beq, bne, blt, bge Passed
U-Type lui, auipc Passed
J-Type jal, jalr Passed
Image Image

Performance Analysis

Metric Single-Cycle Processor 5-Stage Pipelined Processor
Execution Flow One instruction at a time Five instructions in parallel
Clock Period Long (slowest path) Shorter (stage-based)
Throughput 1 instruction / cycle 1 instruction / short cycle (after fill)
Hazard Handling None required Forwarding + Stalls + Flush
Performance Gain ≈ 4× Improvement
Image

RTL Diagrams

These RTL (Register-Transfer-Level) views were auto-generated in Vivado to visualize structural connectivity among modules.

RTL schematic of the Top Module

Image

RTL schematic of the Control Unit Module

Image

RTL schematic of the Instruction Memory Module

Image

RTL schematic of the Branch Comparator Module

Image

RTL schematic of the Immediate Generator Module

Image

RTL schematic of the Register File Module

Image

RTL schematic of the Program Counter Module

Image

RTL schematic of the ALU Logic Module

Image

RTL schematic of the Data Memory Module

Image

RTL schematic of the Pipelined Register Module

Image

RTL schematic of the Forwarding Unit Module

Image

RTL schematic of the Hazard Detection Module

Image

Timing Diagrams

Timing waveforms confirm correct overlap of instructions, data forwarding, and stall behavior across pipeline stages.

Timing Diagram of the Top Module

Image

Timing Diagram of the Control Unit Module

Image

Timing Diagram of the Instruction Memory Module

Image

Timing Diagram of the Branch Comparator Module

Image

Timing Diagram of the Immediate Generator Module

Image

Timing Diagram of the Register File Module

Image

Timing Diagram of the Program Counter Module

Image

Timing Diagram of the ALU Logic Module

Image

Timing Diagram of the Data Memory Module

Image

Timing Diagram of the Pipelined Register Module

Image

Timing Diagram of the Forwarding Unit Module

Image

Timing Diagram of the Hazard Detection Module

Image

Conclusion

The 5-Stage Pipelined RISC-V Processor successfully demonstrates a modern pipelined architecture implemented entirely in SystemVerilog and deployed on the Nexys A7 FPGA.
Through instruction-level parallelism, forwarding, and hazard management, the processor achieves a ≈ 4× increase in throughput compared to a single-cycle design while maintaining functional correctness and timing stability at 100 MHz.

Image

Future Enhancements

  • Branch Prediction Unit – reduce control hazard penalties.
  • Instruction & Data Caches – improve memory latency.
  • Out-of-Order Execution – further boost parallelism.
  • Exception/Interrupt Handling – add system-level robustness.
  • RISC-V Extensions – support M, F, and CSR extensions for advanced features.
Image

License

This project is licensed under the MIT License.

Author

Awais Asghar
NUST Chip Design Centre (NCDC)

Project Folder Structure

Image

Regards

Image
Awais-Asghar/5-Stage-Pipelined-RISC-V-Processor-on-FPGA | GitHunt