Skip to content

RV32I CPU Datapath — RTL to GDSII (FreePDK 45 nm)

DRC/LVS clean, 500 MHz timing closure

RV32I CPU Datapath:RTL→GDSII(FreePDK 45 nm)

Full RTL → GDSII flow on FreePDK 45 nm — DRC/LVS clean, 500 MHz timing closure.

Role
Course project, UIUC VLSI
Dates
Feb 2025 – Apr 2025
  • RTL
  • Physical Design
  • STA
  • Virtuoso
  • RISC-V

UIUC 课程项目(02–04 / 2025)。RTL 设计 / Physical Design / Full-Stack 岗位的杀手锏

← 项目索引

一句话总结

完整 RTL-to-GDSII 流程实现 RV32I 5 级流水线 CPU,Cadence Virtuoso 全定制 32×32 Register File(TG-based latch),DC + Innovus + PrimeTime 综合 P&R + STA,达成 500MHz timing closure(WNS +5ps),core area ~8500 μm²。

简历描述(原文)

RV32I CPU Datapath Design (RTL to GDSII, FreePDK 45nm) | 02/2025 - 04/2025

  • Designed full-custom 32 × 32 Register File in Cadence Virtuoso using TG-based latch storage cells; hand-crafted layout passing DRC/LVS clean with optimized M1–M3 routing.
  • Developed RV32I 5-stage pipelined datapath in SystemVerilog; resolved X-propagation via VCS -xprop and SVA assertions, and replaced tri-state buses with one-hot MUXes.
  • Executed complete RTL-to-GDSII flow with DC Compiler, Innovus, and PrimeTime; authored parameterized Tcl scripts for SDC constraints and multi-corner STA, achieving 500MHz timing closure (WNS: +5ps) with ~8,500 μm² core area.

关键亮点

  • 完整 RTL-to-GDSII flow(很多人只做到 RTL)
  • Full-custom layout(Cadence Virtuoso 手画 RegFile)
  • DRC/LVS clean + M1–M3 routing 优化
  • X-propagation 处理(VCS -xprop + SVA,工业级实践)
  • One-hot MUX 替 tri-state(经典 RTL 设计经验)
  • 500MHz timing closure + multi-corner STA
  • Tcl scripting(SDC + STA 自动化)

面试优先级

⭐⭐⭐⭐⭐ — RTL Designer / Physical Design / Synopsys-Cadence(EDA) 岗主推。展示对全流程的掌握,跟纯 RTL 项目拉开差距。

进度

  • RegFile full-custom 细节(TG latch、布局、M1-M3 routing)
  • RV32I 5-stage 数据通路细节
  • X-propagation 全流程(发现 → -xprop → SVA → 修复)
  • One-hot MUX 替换 trade-off
  • DC / Innovus / PrimeTime flow 细节
  • SDC / STA Tcl 脚本结构
  • 30 个深挖问答
  • 英文版三档 STAR
  • Layout 截图整理(脱敏后)

入口

  • 01_技术细节
  • 02_深挖问答
  • 03_关键决策
  • 04_数据指标
  • 05_英文版讲解

Key Decisions

决策 1:RegFile 用 TG-based Latch

选了 TG latch

  • vs 6T SRAM:6T 需要专门 sense amp,layout 复杂;TG latch 更适合 small-port 多访问
  • vs Flip-Flop:FF 面积大;latch 面积小 + 时钟门控更易做
  • 代价:hold time 设计需小心,不能跨 transparent 阶段污染

决策 2:M1-M3 Routing 分层

选了三层规划

  • M1:cell 内 + power rail
  • M2:水平(横向 routing)
  • M3:垂直 + global power
  • 理由:跟 standard cell library convention 一致,toolflow 友好
  • 代价:多层 via 多,routing density 紧

决策 3:Tri-state → One-hot MUX

选了 One-hot MUX

  • 理由:综合可控、无 contention、STA 清晰
  • 代价:多输入 MUX latency 大,需要 balanced tree
  • 行业实践:ASIC 设计基本不用 tri-state

决策 4:X-Propagation Strategy

选了 -xprop + SVA assertion

  • VCS -xprop:RTL 仿真接近 gate-level 行为
  • SVA assertion:持续监控关键信号不应为 X
  • 代价:仿真慢(每个 X 都被 propagate)
  • 替代:
    • Naive RTL sim:漏报 X-related bug
    • 全 X-pessimistic:可能误报

决策 5:5-stage Pipeline 不上 OoO

选了 5-stage in-order

  • 理由:聚焦 RTL2GDSII flow,而不是体系结构复杂度
  • 代价:IPC 低于 OoO
  • 价值:更易 close timing,layout 更紧凑
  • 互补:OoO 设计放在 02 项目里展示

决策 6:目标频率 500MHz

选了 500MHz

  • 理由:45nm + 简单 5-stage,500MHz 是合理目标
  • 替代:
    • 更高频(800MHz+):需要更深 pipeline,改架构
    • 更低频(250MHz):资源浪费,无挑战性
  • 结果:WNS +5ps,5% 余量,close 但不浪费

决策 7:Multi-corner STA 选哪些 corner

选了 SS / FF / TT

  • 理由:工业最低三 corner 标准
  • 不足:实际 sign-off 需要更多(SS hot/cold、FF hot/cold、不同 V)
  • 下次改进:加 OCV(On-Chip Variation)或 AOCV

决策 8:Tcl 参数化 vs 固定值

选了参数化(set CLK_PERIOD 2.0)

  • 理由:改频率只改顶部参数,不用改全文
  • 代价:可读性略低
  • 价值:重用度高,跑 corner sweep 快

决策 9:SDC 约束的严格度

选了”严但不死”

  • clock_uncertainty 0.1:留出 jitter / skew 余量
  • input/output delay 30% period:工业典型值
  • 若太松:可能 silicon fail
  • 若太严:close 不了 timing

决策 10:Sign-off 工具选型

选了 PrimeTime

  • vs DC 内部 STA:PT 是 sign-off 工具,精度更高
  • vs Tempus(Cadence):学校工具栈选 Synopsys

TODO

  • 每个决策标注是否在面试讲过
  • 准备 follow-up:“如果你不选这个会怎样”

Metrics

简历声明的数字

指标数值备注
工艺FreePDK 45nm开源 PDK
Frequency500MHztiming closure 达成
WNS+5ps正值 = 满足 setup
Core area~8500 μm²总面积
RegFile 规格32 × 32全定制 layout

必须能回答的 follow-up

500MHz 频率

  • 在哪个 corner 达成?(SS / TT / FF)
  • 提升潜力多少?(如果 pipeline 加深)
  • vs SiFive 同等核?

WNS +5ps

  • 哪条 critical path?
  • 是哪类 path(reg2reg / in2reg / reg2out)?
  • 余量为什么是 5ps 而不是更多?

8500 μm² Core area

  • RegFile 占多少?
  • ALU 占多少?
  • 控制逻辑占多少?
  • vs 商用核(SiFive E20 ~5000 μm² @ 45nm 估算)对比

待补充的细节数据

各 stage 时序

  • IF stage critical path:?ps
  • ID stage:?ps
  • EX stage(ALU)critical path:?ps
  • MEM stage:?ps
  • WB stage:?ps

综合数据(DC)

  • Total cells:?
  • Sequential cells:?
  • Combinational cells:?
  • Buf cells:?

P&R 数据(Innovus)

  • Utilization:? %
  • Total wire length:?μm
  • Routing layers used:?

Power(可选,如 PrimeTime PX 跑过)

  • Dynamic power:?mW
  • Leakage:?μW
  • Total:?

设计规模

  • RTL SystemVerilog 行数:?
  • Module 数:?
  • Tcl 脚本行数:?

量化讲法模板

✅ “On FreePDK 45nm SS corner with 25°C 0.9V, the design closed at 500MHz with WNS +5ps. Critical path was the RegFile read → ALU → forward MUX → ALU input, totaling about 1.95ns of the 2ns budget.”

✅ “Core area is approximately 8500 μm², with the 32×32 full-custom RegFile contributing about X%, ALU about Y%, and the rest in pipeline registers and control logic.”

❌ “Frequency is 500MHz.”(没说哪个 corner、什么 voltage)

跟参考核对比(可选)

工艺频率面积
Yours45nm500MHz8500 μm²
SiFive E20(估算)45nm~400MHz~5000 μm²
Rocket(default)45nm~600MHz~15000 μm²

⚠️ 参考数据需自己核实,商用核数据通常需 license。

TODO

  • 跑一遍 STA 拿原始报告
  • 各模块面积分解
  • 各 stage timing breakdown
  • Power analysis(如做过)
  • 跟参考核对比表填实数

STAR Narratives (English)

关键术语对照

中文English
全定制Full-custom
标准单元Standard cell
传输门Transmission Gate(TG)
锁存器Latch
触发器Flip-Flop
综合Synthesis
布局布线Place and Route(P&R)
静态时序分析Static Timing Analysis(STA)
时钟树综合Clock Tree Synthesis(CTS)
时序收敛Timing Closure
工艺角Process Corner
时钟偏斜 / 抖动Skew / Jitter
关键路径Critical Path
时序余量Slack(WNS / TNS)
设计规则检查Design Rule Check(DRC)
版图与原理图比对Layout vs Schematic(LVS)
X-传播X-propagation
三态总线Tri-state Bus
一热多路选择器One-hot MUX
数据通路Datapath
流水线Pipeline
旁路Forwarding / Bypass
多角时序分析Multi-corner STA
约束文件SDC(Synopsys Design Constraints)

30-Second Elevator Pitch

For my VLSI Design course at UIUC, I executed a complete RTL-to-GDSII flow on a
RV32I 5-stage pipelined CPU using FreePDK 45nm. I hand-crafted a full-custom 32-by-32
register file in Cadence Virtuoso using transmission-gate latches, with M1 to M3
routing and DRC-LVS clean. The RTL was written in SystemVerilog with X-propagation
checked through VCS xprop and SVA assertions. I closed timing at 500MHz with positive
5 picosecond worst negative slack, and the core fit in about 8500 square microns.

2-Minute Standard Version(STAR)

Situation

This was the VLSI Design course capstone at UIUC, where the goal was to take a CPU
from SystemVerilog RTL all the way to GDSII layout, using industry-standard tools —
DC Compiler for synthesis, Innovus for place and route, PrimeTime for sign-off STA,
and Calibre for DRC and LVS. We targeted FreePDK 45nm, which is an open-source PDK.

Task

The most challenging part for me was hand-crafting the 32-by-32 register file in
Cadence Virtuoso. I also wrote the SystemVerilog RTL for the 5-stage pipelined
datapath, dealt with X-propagation issues, and authored Tcl scripts for the SDC
constraints and multi-corner STA flow.

Action

For the register file, I used transmission-gate-based latch storage cells and laid
out the cells by hand, with M1 for cell-internal and power, M2 horizontal, and M3
vertical. The full layout passed DRC and LVS clean. For the RTL, I made two key
quality improvements: I caught X-propagation issues using VCS xprop and added SVA
assertions to monitor that critical signals never become unknown. I also replaced
all tri-state buses with one-hot MUXes for synthesis predictability and to avoid
contention. For the flow, I wrote parameterized Tcl scripts so that retargeting to a
different frequency or corner only required changing the top-level parameter.

Result

We closed timing at 500MHz on the slow corner with positive 5 picosecond worst
negative slack — meaning we met all setup constraints with a small safety margin.
The final core area was about 8500 square microns. The whole flow taught me how
each step in ASIC design influences the next, and gave me hands-on experience with
the full Synopsys and Cadence tool chain.

15-30 Minute Deep-Dive Version

大纲

  1. Project Scope(2 min):RTL2GDSII 全流程,工具链
  2. RV32I 5-Stage Datapath(5 min):
    • 白板画 datapath
    • forwarding / hazard
  3. Register File Layout(8 min):
    • TG latch cell schematic + layout
    • M1-M3 routing strategy
    • DRC / LVS 过程
  4. RTL Quality(5 min):
    • X-propagation:发现 → -xprop → SVA → 修复故事
    • Tri-state → One-hot 改造
  5. Synthesis & P&R Flow(5 min):
    • DC compile_ultra
    • Innovus floorplan / placement / CTS / routing
  6. STA Closure Story(5 min):
    • Critical path 是哪条
    • 怎么 close 的(buffering / cell sizing / 改 RTL)
    • WNS +5ps 的意义
  7. Tools & Lessons(2 min):
    • 学到了什么
    • 重做会改什么

高频英文 Q&A

Q: Walk me through how you went from SystemVerilog to GDSII.

A: (待填,讲完整 flow)

Q: Why TG-based latch instead of standard 6T SRAM?

A: (待填)

Q: What was your critical path and how did you close timing?

A: (待填)

Q: How did you handle X-propagation?

A: (待填)

Q: What’s the difference between setup and hold violations?

A: (待填)

Q: Multi-corner STA — which corners did you run and why?

A: (待填)

Q: How did you parameterize your Tcl SDC?

A: (待填)

录音 / 模拟练习清单

  • 30-second pitch 录 ≥ 5 次
  • 2-minute STAR 录 ≥ 5 次
  • 现场默画 datapath + RegFile layout 同时英文讲解
  • X-prop 故事 5 分钟英文版
  • Critical path closure 故事英文版