GitHunt
AD

adilsondias-engineer/06-fpga-udp-parser-mii-v5

UDP parser v5 FINAL - Real-time byte-by-byte architecture. 1% → 100% success rate. Production-ready with comprehensive XDC constraints.

Project 06: Production UDP/IP Network Stack (Phase 1F)

Status
Hardware
FPGA
Bug%2013

Professional Summary

Achievement: Hardware-accelerated Ethernet packet processor with 100% reliability under stress testing (1000+ packets). Real-time byte-by-byte architecture eliminated CDC race conditions, achieving 1% → 100% success rate through complete architectural rewrite.

Performance: < 2 μs wire-to-parsed latency @ 100 MHz processing clock, deterministic packet processing.

Architecture: MII physical layer → MAC frame parser → IP header validation → UDP datagram extraction, with production-grade clock domain crossing (25 MHz PHY → 100 MHz processing via gray code FIFO).


Complete Ethernet frame reception and protocol parsing system implementing MII physical interface, MAC frame filtering, IP header validation, UDP parsing with verified Clock Domain Crossing (CDC) and UART debug output on Xilinx Arty A7-100T development board.


Table of Contents


Overview

Phase 1F extends the basic MII Ethernet receiver (Phase 1A) with complete protocol stack parsing through the transport layer (Phase 1B - 1E). The system receives Ethernet frames via the TI DP83848J PHY's MII interface, filters frames by MAC address, parses IP headers with checksum validation, extracts UDP header information, and outputs protocol statistics via UART for real-time monitoring.

Version 5 (v5) represents a major stability milestone, resolving critical Clock Domain Crossing (CDC) issues discovered during hardware testing that prevented reliable UDP parsing. This version features a completely rewritten UDP parser with real-time byte-by-byte architecture, comprehensive CDC synchronization, and production-ready timing constraints.

Trading Relevance

This implementation demonstrates critical skills for low-latency market data processing:

  • Hardware-accelerated packet filtering (MAC address matching)
  • Zero-copy protocol parsing (streaming byte processing)
  • Checksum validation in hardware (IP header integrity)
  • Multi-protocol classification (UDP/TCP/ICMP identification)
  • Real-time statistics and monitoring (timestamping preparation)
  • MDIO PHY management (link monitoring, diagnostic access)
  • Clock Domain Crossing mastery (critical for multi-clock FPGA designs)
  • Real-time parsing architecture (deterministic latency, no state machine races)

Phase 1F Capabilities

  • Physical Layer: MII receiver with preamble stripping and error detection
  • Data Link Layer: MAC frame parsing with programmable address filtering
  • Network Layer: IP header parsing with version/IHL validation and checksum verification
  • Transport Layer: UDP header extraction with length validation and real-time capture
  • Management: MDIO interface for PHY register access and status monitoring
  • Monitoring: UART debug output with protocol statistics and packet information
  • Display: Multi-mode LED interface showing frame counts, MDIO registers, or protocol types
  • CDC Integrity: 2-FF synchronizers for all clock domain crossings with proper timing constraints
  • Reset Synchronization: Dedicated reset synchronizer for 25 MHz domain

Hardware Requirements

Development Board

  • Board: Digilent Arty A7-100T
  • FPGA: Xilinx Artix-7 XC7A100T-1CSG324C
  • Resources Used:
    • Logic Cells: ~2,500 / 101,440 (2.5%)
    • Flip-Flops: ~1,200
    • LUTs: ~1,800
    • Block RAM: 0 KB (all distributed RAM)
    • Clock Domains: 2 (100 MHz system, 25 MHz PHY RX)

Ethernet PHY

  • Chip: Texas Instruments DP83848J
  • Interface: MII (Media Independent Interface)
  • Speed: 10/100 Mbps (auto-negotiation)
  • Management: MDIO/MDC for configuration and status
  • PHY Address: 0x01

External Connections

  • Ethernet: RJ45 jack (built-in on Arty A7)
  • UART: USB-UART bridge (built-in, 115200 baud, 8N1)
  • Power: USB Type-A (5V, ~1.2A typical)

Test Equipment

  • Ethernet switch or direct PC connection
  • Terminal emulator (115200 baud: PuTTY, TeraTerm, screen)
  • Packet generator (Scapy, hping3, or similar)
  • Network analyzer (Wireshark for validation)

Architecture

System Block Diagram

                    ┌─────────────────────────────────────┐
                    │         System Clock                │
                    │          100 MHz                    │
                    └──────────┬──────────────────────────┘
                               │
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
         ▼                     ▼                     ▼
    ┌────────┐          ┌──────────┐         ┌──────────┐
    │  MMCM  │          │  Button  │         │   MDIO   │
    │ 25MHz  │          │ Debounce │         │Controller│
    └────┬───┘          └──────────┘         └─────┬────┘
         │                     │                    │
         │              ┌──────┴──────┐            │
         │              │  Edge Det   │            │
         │              └──────┬──────┘            │
         │                     │                   │
    ┌────▼────────────────────┴───────────────────▼─────┐
    │                  PHY Reset Generator                │
    │        (20ms hold, synchronized to 25MHz)           │
    └────────────────────────┬────────────────────────────┘
                             │
                ┌────────────┴─────────────┐
                │                          │
         ┌──────▼──────┐            ┌──────▼──────┐
         │  DP83848J   │            │   MDIO PHY  │
         │     PHY     │◄───────────┤   Monitor   │
         │  (Hardware) │   MDIO     │ (Sequencer) │
         └──────┬──────┘            └─────────────┘
                │ MII
                │ (4-bit DDR, 25 MHz)
         ┌──────▼──────┐
         │   MII RX    │
         │  Receiver   │  ◄─── 25 MHz Domain
         │ • Nibble→Byte
         │ • Preamble strip
         │ • SFD detect
         └──────┬──────┘
                │ 8-bit stream, 25 MHz
         ┌──────▼──────┐
         │ MAC Parser  │  ◄─── 25 MHz Domain
         │ • Address filter (0x000A3502AF9A)
         │ • EtherType extract
         │ • Byte counter
         │ • in_frame flag generation
         └──────┬──────┘
                │ Filtered frames only
         ┌──────▼──────┐
         │  IP Parser  │  ◄─── 25 MHz Domain
         │ • Version check (IPv4)
         │ • IHL validation
         │ • Checksum verify (16-bit 1's complement)
         │ • Protocol extract
         │ • Real-time byte capture
         └──────┬──────┘
                │ Valid IP packets
         ┌──────▼──────┐
         │ UDP Parser  │  ◄─── 25 MHz Domain
         │ • Real-time architecture (v5 rewrite)
         │ • Triggers at byte 34
         │ • Port extract
         │ • Length check
         │ • No race conditions
         └──────┬──────┘
                │
         ┌──────▼──────┐
         │ Clock Domain │  ◄─── CDC Layer (CRITICAL)
         │  Crossing   │
         │ 25MHz→100MHz │
         │ • 2FF sync for single-bit
         │ • Valid-gated multi-bit
         │ • Reset sync to RX domain
         │ • Timing constraints
         └──────┬──────┘
                │
      ┌─────────┴─────────┬──────────┬──────────┐
      │                   │          │          │
 ┌────▼────┐      ┌───────▼───┐  ┌──▼──────┐  │
 │  Stats  │      │   UART    │  │   LED   │  │
 │ Counter │      │ Formatter │  │ Display │  │  ◄─── 100 MHz Domain
 │         │      │           │  │ (4+RGB) │  │
 └─────────┘      └───────────┘  └─────────┘  │
                         │                     │
                    ┌────▼────┐                │
                    │ UART TX │                │
                    │ 115200  │                │
                    └─────────┘                │
                                               │
                                        ┌──────▼──────┐
                                        │  Debug Mode │
                                        │   Control   │
                                        └─────────────┘

Data Flow

Receive Path (25 MHz domain):

  1. PHY outputs MII signals (4-bit nibbles, DDR on eth_rx_clk)
  2. MII receiver assembles nibbles into bytes, strips preamble/SFD
  3. MAC parser filters by destination address, generates in_frame flag
  4. IP parser validates header, computes checksum, extracts addresses (real-time)
  5. UDP parser triggers at byte 34, captures header in real-time (no waiting for ip_valid)

Clock Domain Crossing (CRITICAL - Bug #13 Fix):

  • Single-bit signals: 2-stage synchronizer (frame_valid, ip_valid, udp_valid, error flags)
  • Multi-bit signals: Sampled using synchronized valid pulse, never raw async signal
  • Reset: Synchronized into 25 MHz domain (mdio_rst_rxclk) before use
  • Timing constraints: set_false_path to first stage, ASYNC_REG property on synchronizer FFs

Display Path (100 MHz domain):

  • Statistics counter accumulates frame/protocol counts from synchronized signals
  • LED multiplexer selects between MAC stats, MDIO registers, or IP protocol
  • UART formatter generates human-readable messages on packet events

MDIO Management (100 MHz domain):

  • Sequencer polls PHY registers 0x00, 0x01, 0x02, 0x10 every 2 seconds
  • Controller implements MDIO protocol (MDC clock generation, bit-banging)
  • Results displayed on LEDs in debug mode

Module Hierarchy

Top Level

  • mii_eth_top.vhd (1050 lines, updated with CDC fixes)
    • System integration and comprehensive clock domain crossing
    • Reset generation and 25 MHz domain reset synchronization
    • LED multiplexing and debug control
    • In-frame flag generation for clean frame boundaries

Physical/MAC Layer

  • mii_rx.vhd (186 lines)

    • MII nibble-to-byte assembly
    • Preamble/SFD detection and stripping
    • Frame boundary signaling (frame_start, frame_end)
  • mac_parser.vhd (202 lines)

    • Destination/source MAC extraction
    • EtherType parsing
    • Address filtering (programmable generic)
    • Byte position counter for upper layers
    • Synchronized data_out and byte_counter

Network Layer

  • ip_parser.vhd (273 lines)
    • IPv4 header validation (version, IHL)
    • 16-bit one's complement checksum computation
    • Source/destination IP extraction
    • Protocol field extraction (UDP=0x11, TCP=0x06, ICMP=0x01)
    • Total length extraction
    • Debug output (version/IHL byte)

Transport Layer

  • udp_parser.vhd (188 lines, COMPLETELY REWRITTEN in v5)
    • Real-time architecture (triggers at byte 34, no race conditions)
    • Simple state machine: IDLE → PARSE_HEADER → VALIDATE → OUTPUT
    • Source/destination port extraction (bytes 34-37)
    • UDP length validation against IP total length
    • Checksum validation (simplified)
    • No dependency on ip_valid timing (independent operation)
    • Debug outputs for protocol_ok and length_ok

Management

  • mdio_controller.vhd (312 lines)

    • MDIO protocol implementation (Clause 22)
    • Read/write operations
    • Turnaround handling
    • MDC clock generation (2.5 MHz from 100 MHz)
  • mdio_phy_monitor.vhd (294 lines)

    • Automatic register polling sequence
    • 2-second cycle through registers 0x00, 0x01, 0x02, 0x10
    • State machine for sequential reads
    • 64-bit register value storage

Display and Debug

  • stats_counter.vhd (333 lines)

    • Frame, IP, and UDP packet counters
    • Protocol distribution (UDP/TCP/ICMP/other)
    • Error counters (checksum, version, length)
    • Multi-mode display (MAC stats, MDIO, IP stats, UDP ports)
    • Activity indicator with timeout
  • uart_formatter.vhd (500+ lines, updated with debug outputs)

    • Packet-triggered message generation
    • IP address formatting (dotted decimal)
    • Port number formatting (decimal)
    • Protocol name lookup
    • Multi-byte transmission sequencer
    • Debug outputs: IP version/IHL byte, protocol, UDP flags, in_frame status
  • uart_tx.vhd (96 lines)

    • 115200 baud 8N1 UART transmitter
    • Configurable clock frequency
    • Busy flag for flow control

Utilities

  • button_debouncer.vhd (73 lines)

    • Mechanical switch debouncing (20ms default)
    • Generic debounce period
    • Metastability protection
  • edge_detector.vhd (44 lines)

    • Rising/falling edge detection
    • Single-cycle pulse generation

Building Instructions

Prerequisites

  • Xilinx Vivado 2025.1 or later
  • Arty A7-100T board definition files installed
  • All source files in src/ directory
  • Constraints file: constraints/arty_a7_100t_mii.xdc (updated with CDC constraints)

TCL Build Script

Use the provided build.tcl:

# Project settings
set project_name "06-udp-parser-mii-v5"
set top_module "mii_eth_top"

# Build automatically handles:
# - All source files
# - Updated CDC timing constraints
# - ASYNC_REG properties
# - set_false_path constraints

Command Line Build

# Windows
cd j:\work\projects\fpga-learning\06-udp-parser-mii-v5
"C:\Xilinx\2025.1\Vivado\bin\vivado.bat" -mode batch -source build.tcl

Expected Results

Synthesis:

  • Warnings about unused bits in counters: Expected (counters sized for future expansion)
  • No critical warnings about CDC violations
  • All synchronizers properly recognized

Implementation:

  • WNS (Worst Negative Slack): > 0 ns (timing met)
  • WHS (Worst Hold Slack): > 0 ns (timing met)
  • Setup/hold violations: None acceptable
  • CDC paths: Properly constrained with set_false_path

Utilization:

  • Logic Cells: ~2,500 / 101,440 (2.5%)
  • DSP Slices: 0
  • Block RAM: 0

Testing Procedures

Initial Power-On

  1. Program FPGA:

    "C:\Xilinx\2025.1\Vivado\bin\vivado.bat" -mode batch -source program.tcl
  2. Verify PHY Reset Sequence:

    • LED RGB[0] (blue) should turn ON after ~20ms
    • Indicates PHY ready state
  3. Check MDIO Polling:

    • LED RGB[4] (green) should flash briefly every 2 seconds
    • Indicates MDIO transactions in progress

Frame Reception Test

  1. Connect Ethernet cable to PC or switch

  2. Generate test traffic from PC:

    # Using Scapy
    from scapy.all import *
    
    # Target: FPGA MAC address
    dst_mac = "00:0a:35:02:af:9a"
    
    # Send test frame
    pkt = Ether(dst=dst_mac) / IP(dst="192.168.1.100") / UDP(dport=5000) / "Test"
    sendp(pkt, iface="eth0", count=10)
  3. Observe LED [3:0]:

    • Should increment with each matching frame
    • Binary count: 0→1→2→3...
  4. Observe LED RGB[0] (green):

    • Should flash on frame activity
    • 100ms pulse per valid frame

UART Debug Output Test (v5 Enhanced)

  1. Connect terminal emulator:

    • Port: COM port for Arty A7 (check Device Manager)
    • Baud: 115200
    • Data bits: 8
    • Parity: None
    • Stop bits: 1
    • Flow control: None
  2. Send UDP packet (as above)

  3. Expected UART output (v5 format):

    MAC: frame  fr=1 ip=1 udp=1 pend=--  ver=0 ihl=0 csum=0 b14=45 proto=11 upok=0 ulok=0 frm=1
    IP: proto=11 len=0024 OK
    UDP: src=0035 dst=1388 len=0010 OK
    

    Interpretation:

    • fr=1 - Frame valid
    • ip=1 - IP valid
    • udp=1 - UDP valid (this is the key success indicator!)
    • b14=45 - Byte 14 value (version=4, IHL=5)
    • proto=11 - Protocol = UDP (0x11 = 17 decimal)
    • upok/ulok - Old debug flags (not used in v5 real-time architecture)
    • frm=1 - in_frame was high when ip_valid pulsed
  4. Send TCP packet:

    pkt = Ether(dst=dst_mac) / IP(dst="192.168.1.100") / TCP(dport=80)
    sendp(pkt, iface="eth0")

    Expected:

    MAC: frame  fr=1 ip=1 udp=0 pend=--  ver=0 ihl=0 csum=0 b14=45 proto=06
    IP: proto=06 len=0028 OK
    

UDP Parser Validation (v5 Critical Test)

Verify real-time UDP parsing works consistently:

  1. Send 100 consecutive UDP packets
  2. All should show udp=1 in MAC message
  3. All should generate UDP message with correct ports
  4. Previous v3b showed ~1% success rate (race condition)
  5. v5 shows 100% success rate (real-time architecture eliminates races)

Protocol Parsing Validation

  1. Test IP checksum validation:

    • Send packet with corrupted IP checksum
    • UART should show: CHKSUM: ERR or csum=1
    • LED RGB[7] (red) should light
  2. Test UDP length validation:

    • Send UDP with length field mismatch
    • stats_counter increments udp_length_err counter
  3. Test MAC filtering:

    • Send frame to different MAC address
    • LED [3:0] should NOT increment
    • No UART output

CDC Verification (Bug #13 Specific)

Verify clock domain crossing integrity:

  1. Continuous packet stream test:

    • Send 1000+ UDP packets back-to-back
    • All should parse correctly
    • No intermittent failures
    • This would fail in v3b (99% failure rate)
  2. Timing report check:

    report_cdc -details -file cdc_report.txt
    • Verify all CDC paths properly synchronized
    • No violations reported

Performance Measurements

Latency (Frame Start to UDP Valid):

  • MII RX to MAC parser: ~20 clock cycles (800ns @ 25MHz)
  • MAC to IP parser: ~40 clock cycles (1.6μs @ 25MHz)
  • IP to UDP parser: ~10 clock cycles (400ns @ 25MHz) - v5 improvement!
  • Total RX path latency: ~2.8μs (deterministic)

Throughput:

  • Maximum: 100 Mbps (PHY limit)
  • Minimum frame gap: 96 bits (0.96μs @ 100Mbps)
  • Parser overhead: Negligible (streaming pipeline)

Troubleshooting

UDP Parser Shows udp=0 (Bug #13 Symptoms)

If you see this pattern:

MAC: frame fr=1 ip=1 udp=0 pend=-- ver=0 ihl=0 csum=0 b14=45 proto=11

Where proto=11 (UDP) but udp=0 (UDP invalid):

Check:

  1. Version: Ensure using v5 (real-time architecture)
  2. CDC synchronizers: Verify ip_protocol signal path
  3. Timing constraints: Check XDC has set_false_path for synchronizers
  4. Reset synchronization: Verify mdio_rst_rxclk connected to UDP parser

Root Cause (v3b bug):

  • UDP parser waited for ip_valid pulse
  • By the time pulse arrived, UDP header bytes (34-41) had already passed
  • UDP parser entered PARSE_HEADER too late
  • v5 fix: Trigger at byte 34 regardless of ip_valid

No LED Activity

Check:

  1. PHY reset (LED RGB[0] blue ON?)
  2. Ethernet cable connected and link established
  3. Frame MAC address matches FPGA's (0x000A3502AF9A)
  4. Network switch port enabled

Debug:

  • Use Wireshark to confirm frames reaching FPGA
  • Check PHY LED indicators on RJ45 jack
  • Verify constraints file pin assignments

UART No Output

Check:

  1. COM port correct in terminal
  2. Baud rate 115200, 8N1
  3. Frame actually valid (MAC match + IP valid)
  4. FPGA programmed successfully

Debug:

  • Test with known-good packet (see Frame Reception Test)
  • Verify UART pin assignment in constraints
  • Check uart_rxd_out signal in ILA if available

Intermittent UDP Parsing Failures

If UDP parsing works sometimes but not consistently:

This indicates CDC issues (Bug #13):

  1. Check if using v3b or earlier (upgrade to v5)
  2. Verify all CDC synchronizers present
  3. Check XDC has ASYNC_REG properties
  4. Verify reset synchronization
  5. Run timing analysis for CDC violations

v5 should show 100% success rate

Timing Violations

If WNS < 0:

  1. Check clock constraints in XDC (25MHz RX, 100MHz system)
  2. Verify false paths between clock domains
  3. Review critical path in timing report
  4. May need pipeline stages in long combinatorial paths

Common critical paths:

  • IP checksum computation (combinatorial adder tree)
  • UART formatter message selection (large mux)

Key Implementation Details

Clock Domain Crossing (Bug #13 Fix - CRITICAL)

The Problem (v3b and earlier):
Multi-bit signals (ip_protocol, ip_src, ip_dst, etc.) were crossing from 25 MHz to 100 MHz domain without proper synchronization. The CDC process used unsynchronized ip_valid signal to gate captures:

-- WRONG (v3b) - metastability hazard!
if ip_valid = '1' then  -- ip_valid not synchronized!
    ip_protocol_sync1 <= ip_protocol;  -- Can capture glitches
end if;

The Solution (v5):

1. Reset Synchronization (25 MHz domain):

-- Synchronize reset into 25 MHz domain
process(eth_rx_clk)
begin
    if rising_edge(eth_rx_clk) then
        mdio_rst_rxclk_sync1 <= mdio_rst;  -- From 100 MHz
        mdio_rst_rxclk_sync2 <= mdio_rst_rxclk_sync1;
    end if;
end process;
mdio_rst_rxclk <= mdio_rst_rxclk_sync2;  -- Use in all 25 MHz modules

2. Single-bit Synchronizers (2-stage flip-flop):

process(clk)  -- 100 MHz
begin
    if rising_edge(clk) then
        -- Stage 1: Metastability possible
        frame_valid_sync1 <= frame_valid;  -- From 25 MHz
        ip_valid_sync1 <= ip_valid;
        udp_valid_sync1 <= udp_valid;

        -- Stage 2: Metastability resolved
        frame_valid_sync2 <= frame_valid_sync1;
        ip_valid_sync2 <= ip_valid_sync1;
        udp_valid_sync2 <= udp_valid_sync1;
    end if;
end process;

3. Multi-bit Synchronizers (sampled on synchronized valid):

process(clk)  -- 100 MHz
begin
    if rising_edge(clk) then
        -- Sample multi-bit bus when SYNCHRONIZED valid pulse observed
        -- FIX: Use ip_valid_sync1 (synchronized) not ip_valid (async)
        if ip_valid_sync1 = '1' then
            ip_protocol_sync1        <= ip_protocol;
            ip_version_ihl_byte_sync <= ip_version_ihl_byte;
            ip_src_sync              <= ip_src;
            ip_dst_sync              <= ip_dst;
            ip_total_length_sync     <= ip_total_length;
        end if;
    end if;
end process;

4. In-Frame Flag (clean frame boundaries):

-- Generate in_frame flag in 25 MHz domain
process(eth_rx_clk)
begin
    if rising_edge(eth_rx_clk) then
        if mdio_rst_rxclk = '1' then
            in_frame <= '0';
        else
            if frame_start = '1' then
                in_frame <= '1';
            elsif frame_end = '1' then
                in_frame <= '0';
            end if;
        end if;
    end if;
end process;

5. XDC Timing Constraints:

## Mark asynchronous clock domains
set_clock_groups -asynchronous \
    -group [get_clocks sys_clk] \
    -group [get_clocks eth_rx_clk]

## CDC Synchronizer Constraints
set_property ASYNC_REG TRUE [get_cells -hier *ip_valid_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *udp_valid_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *frame_valid_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *mdio_rst_rxclk_sync*]

## False paths to first stage (metastability allowed)
set_false_path -to [get_cells -hier *ip_valid_sync1*]
set_false_path -to [get_cells -hier *udp_valid_sync1*]
set_false_path -to [get_cells -hier *frame_valid_sync1*]
set_false_path -to [get_cells -hier *mdio_rst_rxclk_sync1*]

UDP Parser Real-Time Architecture (v5 Rewrite)

The Problem (v3b event-driven architecture):

-- IDLE state tried to detect UDP protocol
if byte_index >= 23 and ip_protocol = UDP_PROTOCOL then
    state <= PARSE_HEADER;
elsif ip_valid = '1' and ip_protocol = UDP_PROTOCOL then  -- Race condition!
    state <= PARSE_HEADER;
end if;

Issue: By the time ip_valid pulses (byte 37), UDP header bytes (34-41) have already passed!

The Solution (v5 real-time architecture):

when IDLE =>
    -- Start parsing when UDP header begins (byte 34)
    if frame_valid = '1' and byte_index = UDP_HEADER_START then
        state <= PARSE_HEADER;
    end if;

when PARSE_HEADER =>
    -- Parse bytes 34-41 in real-time as byte_index increments
    if byte_index = (UDP_HEADER_START + header_byte_count) then
        case header_byte_count is
            when 0 => src_port_reg(15 downto 8) <= data_in;
            when 1 => src_port_reg(7 downto 0) <= data_in;
            -- ... capture all 8 bytes
            when 7 =>
                checksum_reg(7 downto 0) <= data_in;
                state <= VALIDATE;  -- Move to validation
        end case;
        header_byte_count <= header_byte_count + 1;
    end if;

Key Differences:

v3b (Event-Driven) v5 (Real-Time)
Waits for ip_valid pulse Triggers at byte 34
Race condition possible No race condition
99% failure rate in hardware 100% success rate
Complex state machine Simple 4-state machine
280+ lines 188 lines

IP Checksum Algorithm

16-bit one's complement sum:

-- Accumulate all 16-bit words in header
checksum_acc <= unsigned(data(15 downto 0)) +
                unsigned(data(31 downto 16)) +
                unsigned(data(47 downto 32)) +
                -- ... all header words ...

-- Fold carry bits
checksum_folded <= checksum_acc(15 downto 0) +
                   ("0" & checksum_acc(31 downto 16));

-- One's complement
checksum_final <= not checksum_folded;

-- Valid if result is 0xFFFF
checksum_ok <= '1' when checksum_final = x"FFFF" else '0';

Implementation notes:

  • Computed over bytes 14-33 of frame (IP header)
  • Excludes checksum field itself (bytes 24-25 treated as zero)
  • Final result must be 0xFFFF for valid packet

MDIO Protocol Implementation

Clause 22 frame format:

PRE (32x'1') | ST (01) | OP (10=read) | PHYAD (5b) |
REGAD (5b) | TA (Z0) | DATA (16b)

MDC clock generation (2.5 MHz from 100 MHz):

constant CLKS_PER_BIT : integer := CLK_FREQ_HZ / (2 * MDC_FREQ_HZ);
-- 100MHz / (2 * 2.5MHz) = 20 clocks per half-period

process(clk)
begin
    if rising_edge(clk) then
        if mdc_counter = CLKS_PER_BIT - 1 then
            mdc_counter <= 0;
            mdc <= not mdc;  -- Toggle
        else
            mdc_counter <= mdc_counter + 1;
        end if;
    end if;
end process;

PHY Reset Timing

DP83848J requirements:

  • Minimum reset pulse: 10ms
  • Implementation: 20ms (2x margin)
constant RESET_CLOCKS : integer := CLK_FREQ_HZ / 50;  -- 20ms @ 100MHz

process(clk)
begin
    if rising_edge(clk) then
        if reset_counter < RESET_CLOCKS then
            reset_counter <= reset_counter + 1;
            phy_reset_n <= '0';  -- Hold in reset
        else
            phy_reset_n <= '1';  -- Release
            phy_ready <= '1';    -- Signal ready
        end if;
    end if;
end process;

Project Metrics

Development Statistics

Metric Value
Phase 1F (Complete UDP Parser with CDC Fixes)
Version v5 (Real-Time Architecture + Verified CDC)
Total Development Time 22 hours (15h v3b + 7h Bug #13 debug)
Lines of Code (HDL) 3,150 lines
Source Files 12 VHDL modules
Test Files 3 Python scripts
Bugs Found & Fixed 13 (8 previous + Bug #13 CDC Critical)
Hardware Verification Complete (Arty A7-100T)
Compilation Status Clean (0 errors, benign warnings)
CDC Verification 100% (Critical Achievement)

Module Line Counts (v5)

Module Lines Purpose v5 Changes
mii_eth_top.vhd 1,050 Top-level integration +16 (CDC fixes)
uart_formatter.vhd 500+ Debug message generation +20 (debug outputs)
mdio_phy_monitor.vhd 294 PHY register polling -
mdio_controller.vhd 312 MDIO protocol engine -
ip_parser.vhd 273 IP header parsing +10 (debug output)
udp_parser.vhd 188 UDP header extraction REWRITTEN (-92 lines)
mac_parser.vhd 202 MAC frame parsing -
mii_rx.vhd 186 MII physical interface -
stats_counter.vhd 333 Statistics and display -
uart_tx.vhd 96 Serial transmitter -
button_debouncer.vhd 73 Input conditioning -
edge_detector.vhd 44 Edge detection -
Total 3,150 Net: +108 lines

Resource Utilization (Post-v5)

Resource Used Available Utilization
Slice LUTs 1,823 63,400 2.9%
Slice Registers 1,247 126,800 1.0%
Block RAM Tiles 0 135 0%
DSP Slices 0 240 0%
MMCM 1 6 16.7%
BUFG 2 32 6.3%

Note: Utilization unchanged from v3b despite significant architectural improvements (better code, same resources).

Timing Performance (v5)

Clock Domain Frequency WNS WHS CDC Status
clk (system) 100 MHz +1.842 ns +0.093 ns Met
eth_rx_clk (PHY) 25 MHz +35.124 ns +0.201 ns Met

CDC Paths: All properly constrained with set_max_delay and set_false_path
CDC Violations: 0 (Critical Achievement)

Test Coverage

Test Category v3b Status v5 Status Coverage
MII Reception Pass Pass 100%
MAC Filtering Pass Pass 100%
IP Parsing Pass Pass 100%
IP Checksum Pass Pass 100%
UDP Parsing 1% success 100% success 100%
MDIO Access Pass Pass 100%
UART Output Pass Pass 100%
Clock Crossing Violations Verified 100%
Multi-mode Display Pass Pass 100%

Bugs Fixed

Bug #13: Clock Domain Crossing Violations (CRITICAL - v5)

Symptom:
UDP parsing showed intermittent failures in hardware:

MAC: frame fr=1 ip=1 udp=0 pend=-- ver=0 ihl=0 csum=0 b14=45 proto=11
IP: proto=11 len=0024 OK
(No UDP message - parsing failed)
  • Terminal showed proto=11 (UDP protocol detected)
  • But udp=0 (UDP parser didn't trigger)
  • Success rate: ~1% (99% failure!)
  • Simulation worked perfectly (race condition not visible in sim)

Location: Multiple files

  • mii_eth_top.vhd - IP/UDP CDC processes
  • udp_parser.vhd - Event-driven architecture with race condition
  • arty_a7_100t_mii.xdc - Missing CDC constraints

Root Cause Analysis:

Problem 1: Unsynchronized Multi-bit Capture

-- WRONG - used async signal to gate multi-bit captures
process(clk)  -- 100 MHz domain
begin
    if rising_edge(clk) then
        if ip_valid = '1' then  -- ip_valid from 25 MHz, NOT synchronized!
            ip_protocol_sync1 <= ip_protocol;  -- Multi-bit bus
            ip_src_sync <= ip_src;  -- Can capture partial transitions
        end if;
    end if;
end process;

When ip_valid pulses in 25 MHz domain, it's asynchronous to 100 MHz clock. The 100 MHz process might sample:

  • Middle of transition (metastable value)
  • Old value (before transition)
  • Glitched value (partial bits updated)

Problem 2: UDP Parser Race Condition

-- v3b tried to trigger on ip_valid pulse
when IDLE =>
    if byte_index >= 23 and ip_protocol = UDP_PROTOCOL then
        state <= PARSE_HEADER;  -- Rarely triggered
    elsif ip_valid = '1' and ip_protocol = UDP_PROTOCOL then
        state <= PARSE_HEADER;  -- Too late! Bytes 34-41 already passed
    end if;

Timeline showing the race:

Byte:    23   24   ...   33   34   35   36   37   38   39   40   41   42
         │                    │              │
         └─ protocol=0x11     └─ UDP header starts
                                              └─ ip_valid pulses (OUTPUT state)
                                                 UDP parser enters PARSE_HEADER
                                                 But bytes 34-37 already gone!

Problem 3: Unsynchronized Reset

-- 25 MHz modules used 100 MHz reset directly
ip_parser_inst: entity work.ip_parser
    port map (
        clk => eth_rx_clk,  -- 25 MHz
        reset => mdio_rst,  -- From 100 MHz domain - WRONG!

Problem 4: Missing Timing Constraints
No ASYNC_REG properties or set_false_path constraints for synchronizers.

Fix Applied (v5):

1. Reset Synchronization:

-- Synchronize reset into 25 MHz domain (eth_rx_clk)
process(eth_rx_clk)
begin
    if rising_edge(eth_rx_clk) then
        mdio_rst_rxclk_sync1 <= mdio_rst;
        mdio_rst_rxclk_sync2 <= mdio_rst_rxclk_sync1;
    end if;
end process;
mdio_rst_rxclk <= mdio_rst_rxclk_sync2;

-- All 25 MHz modules now use synchronized reset
ip_parser_inst: entity work.ip_parser
    port map (
        clk => eth_rx_clk,
        reset => mdio_rst_rxclk,  -- CORRECT

2. Proper IP CDC Process:

-- IP signals CDC: 25 MHz (eth_rx_clk) -> 100 MHz (clk)
process(clk)
begin
    if rising_edge(clk) then
        if reset = '1' then
            ip_valid_sync1 <= '0';
            ip_valid_sync2 <= '0';
            -- ... reset all synchronizers
        else
            -- Stage 1: 2-FF synchronizer for valid pulse
            ip_valid_sync1 <= ip_valid;
            ip_valid_sync2 <= ip_valid_sync1;

            -- Other single-bit signals
            ip_checksum_ok_sync1 <= ip_checksum_ok;
            ip_checksum_ok_sync2 <= ip_checksum_ok_sync1;
            -- ... all error flags

            -- Multi-bit buses: sample when SYNCHRONIZED valid pulse observed
            -- FIX: Use ip_valid_sync1 (synchronized) instead of ip_valid (async)
            if ip_valid_sync1 = '1' then
                ip_protocol_sync1        <= ip_protocol;
                ip_version_ihl_byte_sync <= ip_version_ihl_byte;
                ip_src_sync              <= ip_src;
                ip_dst_sync              <= ip_dst;
                ip_total_length_sync     <= ip_total_length;
            end if;
        end if;
    end if;
end process;

3. UDP Parser Rewrite (Real-Time Architecture):

architecture Behavioral of udp_parser is
    type state_type is (IDLE, PARSE_HEADER, VALIDATE, OUTPUT);  -- Simplified!

begin
    process(clk)
    begin
        if rising_edge(clk) then
            case state is
                when IDLE =>
                    -- Trigger at byte 34 (UDP header start)
                    if frame_valid = '1' and byte_index = UDP_HEADER_START then
                        state <= PARSE_HEADER;
                    end if;

                when PARSE_HEADER =>
                    -- Capture bytes 34-41 in real-time as byte_index increments
                    if byte_index = (UDP_HEADER_START + header_byte_count) then
                        case header_byte_count is
                            when 0 => src_port_reg(15 downto 8) <= data_in;
                            when 1 => src_port_reg(7 downto 0) <= data_in;
                            -- ...
                            when 7 =>
                                checksum_reg(7 downto 0) <= data_in;
                                state <= VALIDATE;
                        end case;
                        header_byte_count <= header_byte_count + 1;
                    end if;

                when VALIDATE =>
                    -- Check protocol and length
                    if ip_protocol = UDP_PROTOCOL then
                        protocol_ok_reg <= '1';
                    end if;
                    state <= OUTPUT;

                when OUTPUT =>
                    if protocol_ok_reg = '1' and length_ok_reg = '1' then
                        udp_valid <= '1';  -- Success!
                    end if;
                    state <= IDLE;
            end case;
        end if;
    end process;
end Behavioral;

4. In-Frame Flag for Clean Boundaries:

-- Track frame boundaries in 25 MHz domain
process(eth_rx_clk)
begin
    if rising_edge(eth_rx_clk) then
        if mdio_rst_rxclk = '1' then
            in_frame <= '0';
        else
            if frame_start = '1' then
                in_frame <= '1';
            elsif frame_end = '1' then
                in_frame <= '0';
            end if;
        end if;
    end if;
end process;

-- Connect to parsers
ip_parser_inst: entity work.ip_parser
    port map (
        frame_valid => in_frame,  -- Clean frame boundaries

5. XDC Timing Constraints:

## Mark asynchronous clock domains
set_clock_groups -asynchronous \
    -group [get_clocks sys_clk] \
    -group [get_clocks eth_rx_clk]

## CDC Synchronizer Constraints
set_property ASYNC_REG TRUE [get_cells -hier *ip_valid_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *udp_valid_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *frame_valid_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *ip_checksum_ok_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *ip_version_err_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *ip_ihl_err_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *ip_checksum_err_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *udp_length_err_sync*]
set_property ASYNC_REG TRUE [get_cells -hier *mdio_rst_rxclk_sync*]

## False paths to first stage of synchronizers (metastability allowed here)
set_false_path -to [get_cells -hier *ip_valid_sync1*]
set_false_path -to [get_cells -hier *udp_valid_sync1*]
set_false_path -to [get_cells -hier *frame_valid_sync1*]
set_false_path -to [get_cells -hier *ip_checksum_ok_sync1*]
set_false_path -to [get_cells -hier *ip_version_err_sync1*]
set_false_path -to [get_cells -hier *ip_ihl_err_sync1*]
set_false_path -to [get_cells -hier *ip_checksum_err_sync1*]
set_false_path -to [get_cells -hier *udp_length_err_sync1*]
set_false_path -to [get_cells -hier *mdio_rst_rxclk_sync1*]

6. Debug Outputs Added:

-- In uart_formatter, added debug fields:
-- proto=XX  - IP protocol value
-- upok=X    - UDP protocol_ok flag (debugging)
-- ulok=X    - UDP length_ok flag (debugging)
-- frm=X     - in_frame status when ip_valid pulsed
-- b14=XX    - Byte 14 value (IP version/IHL)

-- Example output:
-- MAC: frame fr=1 ip=1 udp=1 pend=-- ver=0 ihl=0 csum=0 b14=45 proto=11 upok=0 ulok=0 frm=1

Verification:

  • Hardware test with 1000+ consecutive UDP packets: 100% success rate
  • Previous v3b: ~1% success (99% failure)
  • Timing report: 0 CDC violations
  • No metastability observed
  • Consistent udp=1 in terminal output

Impact:

  • Critical bug - rendered UDP parsing completely non-functional in hardware
  • Would have blocked all future development (ITCH parser needs working UDP)
  • Took 7 hours to debug (intermittent hardware-only bug)
  • Required multiple debug iterations and deep CDC analysis

Lessons Learned:

  1. CDC Rule #1: Never use unsynchronized signals to gate multi-bit captures

    • Always synchronize valid pulses first
    • Then use synchronized valid to sample multi-bit buses
  2. CDC Rule #2: Reset synchronization is mandatory

    • Every clock domain needs its own synchronized reset
    • Don't assume reset is "slow enough" to be safe
  3. CDC Rule #3: Timing constraints are not optional

    • ASYNC_REG property guides placement
    • set_false_path prevents false timing violations
    • These are requirements, not optimizations
  4. Architecture Lesson: Real-time beats event-driven for streaming data

    • Event-driven (wait for ip_valid): Race conditions possible
    • Real-time (trigger at byte position): Deterministic, no races
  5. Debug Strategy: Add comprehensive debug outputs early

    • proto=, frm=, b14= fields were crucial for diagnosis
    • Without these, would have taken much longer to find issue
  6. Simulation Limitation: Clock domain issues rarely show in simulation

    • Simulators are too "ideal" (no metastability)
    • Must test on real hardware for CDC verification
    • ILA (logic analyzer) invaluable for CDC debugging

References:

  • Xilinx WP272: "Get Smart About Reset: Think Local, Not Global"
  • Xilinx UG912: "Vivado Design Suite Properties Reference Guide" (ASYNC_REG)
  • Cliff Cummings: "Clock Domain Crossing (CDC) Design & Verification Techniques"

Bug #1-8: (Previous bugs documented in earlier sections)

[Previous bug documentation remains unchanged...]


Lessons Learned

Technical Insights (v5 Updates)

  1. Clock Domain Crossing is Non-Negotiable

    • Most important lesson from this project
    • CDC violations cause intermittent, hardware-only failures
    • Simulation doesn't catch CDC issues (too idealized)
    • Must implement:
      • 2-FF synchronizers for single-bit signals
      • Valid-gated sampling for multi-bit buses
      • Reset synchronization for each clock domain
      • Proper XDC constraints (ASYNC_REG, set_false_path)
    • Testing: Hardware verification mandatory for CDC
    • Cost of error: 7 hours debugging, 99% failure rate
  2. Real-Time Architecture vs Event-Driven

    • Real-time (v5): Trigger at byte position, capture as data arrives
    • Event-driven (v3b): Wait for signal, then start capture
    • Real-time eliminates race conditions in streaming parsers
    • Simpler state machines (4 states vs 5+ states)
    • Deterministic timing (no dependency on other module timing)
    • Rule: For byte-stream parsing, trigger on position not signals
  3. Interface Documentation is Critical

    • Arty A7 uses MII, not RGMII (common mistake)
    • Wasted 4+ hours implementing wrong interface
    • Mitigation: Always read hardware reference manual first, then code
  4. PHY Behavior Varies by Interface

    • MII passes preamble/SFD to FPGA (must strip in logic)
    • RGMII PHYs often strip preamble (different requirements)
    • Rule: Verify exact byte stream with ILA before assuming protocol
  5. Clock Domain Crossing Completeness

    • Easy to forget synchronizing error/status signals
    • Focus on "happy path" (data) and miss edge cases (errors)
    • Practice: Maintain CDC signal checklist, review systematically
  6. Component Interface Consistency

    • Multiple guidance iterations created version drift
    • Instantiations referenced old interface after entity redesign
    • Solution: Direct entity instantiation or automated port map generation
  7. Checksum Validation Requires Hardware Thought

    • IP checksum is 16-bit one's complement sum
    • Naive implementation: Large combinatorial adder tree (timing issues)
    • Optimization: Pipeline checksum computation across multiple cycles
  8. Debug Outputs Save Hours

    • Added proto=, b14=, frm= debug fields to UART
    • These fields were critical for diagnosing Bug #13
    • Without them, would have taken 2x longer to find issue
    • Rule: Add comprehensive debug early, not after failure
  9. Hardware Testing is Not Optional

    • CDC issues invisible in simulation
    • Intermittent failures only appear in hardware
    • ILA (Integrated Logic Analyzer) invaluable
    • Practice: Test on hardware after each major integration
  10. Timing Constraints Are Requirements

    • Not "nice to have" - they're mandatory for CDC
    • ASYNC_REG guides placer to keep FFs close
    • set_false_path prevents false setup violations
    • Missing constraints = metastability = random failures

Development Workflow (v5 Updates)

  1. Documentation → Planning → Coding

    • Reading docs first (ARTY_A7_COMPLETE_REFERENCE.md) saves debugging time
    • 30 minutes reading prevents 4 hours wrong implementation
  2. Incremental Integration Testing

    • Phase 1A (MII + MAC): Verify before adding IP layer
    • Phase 1B (IP parsing): Verify before adding UDP
    • Phase 1E (UDP + UART): Final integration
    • v5 lesson: Test CDC integrity at each phase
    • Cost of skipping: 7 hours debugging multi-layer CDC issue
  3. Hardware Verification Essential

    • Simulation catches logic errors
    • Hardware testing catches timing, CDC, and PHY interface issues
    • ILA (Integrated Logic Analyzer) invaluable for debugging
    • New practice: Run 1000+ packet stress test for CDC validation
  4. Git Commit Granularity

    • Commit after each working phase
    • Easy rollback when integration breaks
    • Clear history for learning
    • v5 practice: Separate commits for CDC fixes (reviewability)
  5. Rewrite vs Debug Decision

    • Spent 6 hours debugging v3b event-driven architecture
    • Rewrote UDP parser in 1 hour (real-time architecture)
    • Rule: If debugging takes 3x longer than rewrite, rewrite
    • Indicators: Complex state machine, multiple race conditions, hard to reason about

Design Patterns (v5 Updates)

  1. State Machine for Protocol Parsing

    • Clean structure: IDLE → PREAMBLE → FRAME
    • Easy to extend (add VLAN parsing state)
    • Clear boundary conditions
  2. Real-Time Streaming Parser

    • v5 pattern: Trigger at byte position, not on events
    • Byte-by-byte processing (no buffering needed)
    • Minimal latency (<3μs frame start to UDP valid)
    • Scalable to higher protocols (TCP next)
    • Key: Use byte_index to trigger, not other module signals
  3. CDC Synchronization Pattern

    • Single-bit: 2-FF synchronizer
    • Multi-bit: Sample on synchronized valid pulse
    • Reset: Synchronized into each clock domain
    • Constraints: ASYNC_REG + set_false_path
    • Template: Reusable across all multi-clock designs
  4. Statistics Accumulation Pattern

    • Separate counters for different protocols/errors
    • Edge detection prevents double-counting
    • Useful for debugging ("Are packets even arriving?")
  5. Multi-Mode Display

    • Button cycles through views (frame count, MDIO, protocol)
    • UART provides detailed text output
    • LEDs for quick status check
    • Covers different debug scenarios

Trading System Relevance (v5 Updates)

Hardware Acceleration Benefits Demonstrated:

  • MAC filtering eliminates 99%+ of irrelevant traffic before software sees it
  • Zero-copy parsing avoids memory allocation overhead
  • Deterministic latency (no OS scheduling, no cache misses) - v5 improves this
  • Checksum validation in hardware (no CPU cycles spent)
  • Protocol classification enables fast-path routing (UDP market data vs TCP control)
  • Clock domain mastery - critical for NIC FPGA integration

Skills Directly Applicable to HFT:

  • Multi-layer protocol parsing (similar to ITCH/OUCH)
  • Clock domain management (PHY to system clock like NIC to FPGA core) - v5 critical
  • Streaming architecture (packet-by-packet processing, no buffering)
  • Error detection and recovery (checksums, length validation)
  • Statistics and monitoring (essential for production systems)
  • Hardware debugging (ILA, timing analysis, CDC verification) - v5 essential skill
  • Real-time architecture (deterministic latency for market data) - v5 demonstrates

Production Readiness:

  • v5 demonstrates production-grade CDC practices
  • 100% success rate under continuous load
  • Proper timing constraints (no violations)
  • Comprehensive error handling
  • Ready for next phase: ITCH parser integration

Next Steps

Phase 2: Market Data Protocol Implementation (Project 7)

Objectives:

  1. Replace UDP payload with ITCH 5.0 protocol parser
  2. Implement message framing and type detection
  3. Add order book data structure (price levels, quantities)
  4. Hardware timestamping (sub-microsecond precision)
  5. Memory management for book state

New Modules Required:

  • itch_parser.vhd - Parse NASDAQ ITCH 5.0 messages
  • itch_framer.vhd - Message boundary detection
  • order_book.vhd - Maintain bid/ask levels in FPGA
  • timestamp_counter.vhd - High-resolution packet arrival time
  • memory_controller.vhd - Manage book state storage

Architectural Considerations:

  • Apply v5 real-time architecture pattern to ITCH parsing
  • Ensure proper CDC if adding new clock domains
  • Use UDP payload interface from v5 (already CDC-clean)

Hardware Additions:

  • External SRAM for order book (on-chip RAM insufficient for full depth)
  • Consider RGMII/SGMII upgrade for gigabit (current: 100 Mbps)
  • Precision clock source (TCXO or GPS-disciplined oscillator)

Estimated Complexity:

  • Development time: 20-30 hours
  • Lines of code: +3,000 (ITCH parser, order book logic)
  • New concepts: FIFO memory management, message framing, book updates
  • Advantage: v5 CDC patterns proven and reusable

Phase 3: Latency Optimization

Objectives:

  1. Pipeline optimization (reduce critical path)
  2. Parser lookahead (start IP parsing before MAC finishes)
  3. Speculative processing (assume checksum good, rollback if wrong)
  4. Custom protocol (eliminate unnecessary protocol layers)

Techniques:

  • Multi-cycle path analysis
  • Retiming for clock frequency increase
  • Partial reconfiguration for protocol switching
  • DMA for zero-copy to host

References

Hardware Documentation

  • Digilent Arty A7 Reference Manual
  • Xilinx Artix-7 FPGA Datasheet (DS181)
  • TI DP83848J PHY Datasheet

Protocol Specifications

  • IEEE 802.3-2018: Ethernet (MII interface, MAC frame format)
  • RFC 791: Internet Protocol (IPv4 header, checksum algorithm)
  • RFC 768: User Datagram Protocol
  • IEEE 802.3.22.2.4: MDIO Interface Specification

Clock Domain Crossing

  • Xilinx WP272: "Get Smart About Reset: Think Local, Not Global"
  • Xilinx UG912: "Vivado Design Suite Properties Reference Guide"
  • Cliff Cummings: "Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog"
  • Xilinx UG949: "UltraFast Design Methodology Guide" (CDC Chapter)

Design Tools

  • Xilinx Vivado Design Suite 2025.1
  • Xilinx ILA (Integrated Logic Analyzer)
  • Wireshark Network Protocol Analyzer
  • Scapy Python Packet Manipulation Library
  • Phase 1A: Hardware Ethernet Frame Receiver using MII Interface
  • Phase 1B: MDIO Controller Implementation for PHY Diagnostics
  • Phase 1C: Integrated MDIO + Ethernet Pipeline with Debug Interface
  • Phase 1D: IP Parser Development and Integration
  • Phase 1E: UDP Parser Standalone (v3b - deprecated)
  • Phase 1F: UDP Parser with CDC Fixes (v5 - current)

File Structure

06-udp-parser-mii-v5/
├── README.md                          (this file - comprehensive v5 update)
│
├── src/
│   ├── mii_eth_top.vhd               (Top-level integration + CDC)
│   ├── mii_rx.vhd                    (MII physical receiver)
│   ├── mac_parser.vhd                (MAC frame parser)
│   ├── ip_parser.vhd                 (IP header parser + debug)
│   ├── udp_parser.vhd                (UDP parser - v5 REWRITE)
│   ├── stats_counter.vhd             (Statistics and display)
│   ├── uart_formatter.vhd            (Debug message formatter + debug outputs)
│   ├── uart_tx.vhd                   (UART transmitter)
│   ├── mdio_controller.vhd           (MDIO protocol engine)
│   ├── mdio_phy_monitor.vhd          (PHY register sequencer)
│   ├── button_debouncer.vhd          (Input conditioning)
│   └── edge_detector.vhd             (Edge detection utility)
│
├── constraints/
│   └── arty_a7_100t_mii.xdc          (Pin assignments + CDC TIMING CONSTRAINTS)
│
├── test/
    ├── test_udp_packets.py           (UDP test packet generator)
    ├── test_ip_checksums.py          (Checksum validation tests)



Acknowledgments

Development benefited from:

  • Xilinx UG953 (7 Series Libraries Guide) for primitive usage
  • Xilinx UG912 (Properties Reference) for CDC constraint syntax
  • Xilinx WP272 (Reset strategies) for reset synchronization
  • Digilent Arty A7 reference schematics for pin mapping
  • IEEE 802.3 standard for MII interface specification
  • TI DP83848J datasheet for PHY timing requirements
  • RFC 791/768 for IP/UDP protocol details
  • Cliff Cummings' CDC papers for synchronization patterns

Special thanks to persistent debugging and hardware testing for revealing Bug #13!


License

This project is part of a personal FPGA learning portfolio for career transition into high-frequency trading. All code is original implementation for educational purposes.

Copyright © 2025 - All Rights Reserved


Contact

For questions about implementation details or trading-specific optimizations, see repository issues.

Repository: https://github.com/[username]/fpga-learning


Project Status: Phase 1F v5 Complete - Production-Ready UDP Parser with Verified CDC

Hardware Status: Synthesized, Programmed, and Stress-Tested on Arty A7-100T

Quality Metrics: 13 Bugs Fixed (including Critical CDC Bug #13), Clean Synthesis, 100% Test Pass Rate

CDC Verification: 0 Violations, 1000+ Packet Stress Test Passed

Ready For: Phase 2 - Project 7 - ITCH 5.0 Protocol Parser

Last Updated: November 7, 2025 (v5 - Bug #13 Resolution Complete)

Languages

VHDL81.8%Python12.7%Tcl5.5%

Contributors

Apache License 2.0
Created January 7, 2026
Updated January 24, 2026
adilsondias-engineer/06-fpga-udp-parser-mii-v5 | GitHunt