Single-Issue RISC-V Out-of-Order Pipeline Processor

Overview

A single-issue out-of-order RV32IM core built from scratch in SystemVerilog. Implements a Tomasulo-style five-stage pipeline with full register renaming, a split L1 cache hierarchy, and a BTB + RAT branch predictor. Designed, implemented, and verified end-to-end as part of UIUC’s advanced computer architecture course.

Tech Stack

SystemVerilog · RV32IM · Tomasulo · Harvard cache · BTB / RAT · VCS / Verdi · RVFI · Top-Down analysis

Microarchitecture

Pipeline: 5-stage Tomasulo — Fetch → Decode/Rename → Issue → Execute → Commit. Reservation Station (RVS) drives dynamic scheduling; CDB broadcasts results to dependent instructions.
Register renaming: RAT maps architectural to physical registers; eliminates WAW / WAR hazards without stalls.
Cache hierarchy: Harvard L1 with separate I- and D-Cache, both 4-way set-associative, write-back + write-allocate, PLRU replacement.
Branch prediction: BTB + RAT-style predictor. Recovers speculative state on misprediction via the rename map.
Performance counters: 12 counters wired into commit + load/branch paths; used for Top-Down analysis to localize stalls.

Results

88% branch prediction accuracy on the course benchmark suite.
~18% IPC improvement vs. the in-order baseline after Top-Down-guided load/branch latency tuning.
Passes the full ECE 411 functional regression including a RISC-V Formal Interface (RVFI) trace check.

Key Decisions

Tomasulo over scoreboarding. Tomasulo’s CDB broadcast plus register renaming kills WAR/WAW hazards entirely, which matters more than the extra structural cost on a single-issue core. Scoreboarding would have left fake dependencies in the schedule.
PLRU instead of true-LRU. 4-way true-LRU costs 24 bits/set for a complete order vector; PLRU costs 3 bits/set with under 1 pp hit-rate loss on the benchmark suite. The right size/performance tradeoff for a single-issue core.
Performance counters wired in from day one. Adding counters late forces a pipeline re-verification pass; building them into the initial RTL let the Top-Down tuning loop run cleanly across the last two weeks of the project.

Overview

Tech Stack

Microarchitecture

Results

Key Decisions

Discussion