RV32I CPU Datapath — RTL to GDSII (FreePDK 45 nm)
DRC/LVS clean, 500 MHz timing closureRV32I CPU Datapath:RTL→GDSII(FreePDK 45 nm)
Full RTL → GDSII flow on FreePDK 45 nm — DRC/LVS clean, 500 MHz timing closure.
- Role
- Course project, UIUC VLSI
- Dates
- Feb 2025 – Apr 2025
- RTL
- Physical Design
- STA
- Virtuoso
- RISC-V
UIUC 课程项目(02–04 / 2025)。RTL 设计 / Physical Design / Full-Stack 岗位的杀手锏。
← 项目索引
一句话总结
完整 RTL-to-GDSII 流程实现 RV32I 5 级流水线 CPU,Cadence Virtuoso 全定制 32×32 Register File(TG-based latch),DC + Innovus + PrimeTime 综合 P&R + STA,达成 500MHz timing closure(WNS +5ps),core area ~8500 μm²。
简历描述(原文)
RV32I CPU Datapath Design (RTL to GDSII, FreePDK 45nm) | 02/2025 - 04/2025
- Designed full-custom 32 × 32 Register File in Cadence Virtuoso using TG-based latch storage cells; hand-crafted layout passing DRC/LVS clean with optimized M1–M3 routing.
- Developed RV32I 5-stage pipelined datapath in SystemVerilog; resolved X-propagation via VCS -xprop and SVA assertions, and replaced tri-state buses with one-hot MUXes.
- Executed complete RTL-to-GDSII flow with DC Compiler, Innovus, and PrimeTime; authored parameterized Tcl scripts for SDC constraints and multi-corner STA, achieving 500MHz timing closure (WNS: +5ps) with ~8,500 μm² core area.
关键亮点
- ✅ 完整 RTL-to-GDSII flow(很多人只做到 RTL)
- ✅ Full-custom layout(Cadence Virtuoso 手画 RegFile)
- ✅ DRC/LVS clean + M1–M3 routing 优化
- ✅ X-propagation 处理(VCS -xprop + SVA,工业级实践)
- ✅ One-hot MUX 替 tri-state(经典 RTL 设计经验)
- ✅ 500MHz timing closure + multi-corner STA
- ✅ Tcl scripting(SDC + STA 自动化)
面试优先级
⭐⭐⭐⭐⭐ — RTL Designer / Physical Design / Synopsys-Cadence(EDA) 岗主推。展示对全流程的掌握,跟纯 RTL 项目拉开差距。
进度
- RegFile full-custom 细节(TG latch、布局、M1-M3 routing)
- RV32I 5-stage 数据通路细节
- X-propagation 全流程(发现 → -xprop → SVA → 修复)
- One-hot MUX 替换 trade-off
- DC / Innovus / PrimeTime flow 细节
- SDC / STA Tcl 脚本结构
- 30 个深挖问答
- 英文版三档 STAR
- Layout 截图整理(脱敏后)
入口
- 01_技术细节
- 02_深挖问答
- 03_关键决策
- 04_数据指标
- 05_英文版讲解
Key Decisions
决策 1:RegFile 用 TG-based Latch
选了 TG latch
- vs 6T SRAM:6T 需要专门 sense amp,layout 复杂;TG latch 更适合 small-port 多访问
- vs Flip-Flop:FF 面积大;latch 面积小 + 时钟门控更易做
- 代价:hold time 设计需小心,不能跨 transparent 阶段污染
决策 2:M1-M3 Routing 分层
选了三层规划
- M1:cell 内 + power rail
- M2:水平(横向 routing)
- M3:垂直 + global power
- 理由:跟 standard cell library convention 一致,toolflow 友好
- 代价:多层 via 多,routing density 紧
决策 3:Tri-state → One-hot MUX
选了 One-hot MUX
- 理由:综合可控、无 contention、STA 清晰
- 代价:多输入 MUX latency 大,需要 balanced tree
- 行业实践:ASIC 设计基本不用 tri-state
决策 4:X-Propagation Strategy
选了 -xprop + SVA assertion
- VCS -xprop:RTL 仿真接近 gate-level 行为
- SVA assertion:持续监控关键信号不应为 X
- 代价:仿真慢(每个 X 都被 propagate)
- 替代:
- Naive RTL sim:漏报 X-related bug
- 全 X-pessimistic:可能误报
决策 5:5-stage Pipeline 不上 OoO
选了 5-stage in-order
- 理由:聚焦 RTL2GDSII flow,而不是体系结构复杂度
- 代价:IPC 低于 OoO
- 价值:更易 close timing,layout 更紧凑
- 互补:OoO 设计放在 02 项目里展示
决策 6:目标频率 500MHz
选了 500MHz
- 理由:45nm + 简单 5-stage,500MHz 是合理目标
- 替代:
- 更高频(800MHz+):需要更深 pipeline,改架构
- 更低频(250MHz):资源浪费,无挑战性
- 结果:WNS +5ps,5% 余量,close 但不浪费
决策 7:Multi-corner STA 选哪些 corner
选了 SS / FF / TT
- 理由:工业最低三 corner 标准
- 不足:实际 sign-off 需要更多(SS hot/cold、FF hot/cold、不同 V)
- 下次改进:加 OCV(On-Chip Variation)或 AOCV
决策 8:Tcl 参数化 vs 固定值
选了参数化(set CLK_PERIOD 2.0)
- 理由:改频率只改顶部参数,不用改全文
- 代价:可读性略低
- 价值:重用度高,跑 corner sweep 快
决策 9:SDC 约束的严格度
选了”严但不死”
- clock_uncertainty 0.1:留出 jitter / skew 余量
- input/output delay 30% period:工业典型值
- 若太松:可能 silicon fail
- 若太严:close 不了 timing
决策 10:Sign-off 工具选型
选了 PrimeTime
- vs DC 内部 STA:PT 是 sign-off 工具,精度更高
- vs Tempus(Cadence):学校工具栈选 Synopsys
TODO
- 每个决策标注是否在面试讲过
- 准备 follow-up:“如果你不选这个会怎样”
Metrics
简历声明的数字
| 指标 | 数值 | 备注 |
|---|---|---|
| 工艺 | FreePDK 45nm | 开源 PDK |
| Frequency | 500MHz | timing closure 达成 |
| WNS | +5ps | 正值 = 满足 setup |
| Core area | ~8500 μm² | 总面积 |
| RegFile 规格 | 32 × 32 | 全定制 layout |
必须能回答的 follow-up
500MHz 频率
- 在哪个 corner 达成?(SS / TT / FF)
- 提升潜力多少?(如果 pipeline 加深)
- vs SiFive 同等核?
WNS +5ps
- 哪条 critical path?
- 是哪类 path(reg2reg / in2reg / reg2out)?
- 余量为什么是 5ps 而不是更多?
8500 μm² Core area
- RegFile 占多少?
- ALU 占多少?
- 控制逻辑占多少?
- vs 商用核(SiFive E20 ~5000 μm² @ 45nm 估算)对比
待补充的细节数据
各 stage 时序
- IF stage critical path:?ps
- ID stage:?ps
- EX stage(ALU)critical path:?ps
- MEM stage:?ps
- WB stage:?ps
综合数据(DC)
- Total cells:?
- Sequential cells:?
- Combinational cells:?
- Buf cells:?
P&R 数据(Innovus)
- Utilization:? %
- Total wire length:?μm
- Routing layers used:?
Power(可选,如 PrimeTime PX 跑过)
- Dynamic power:?mW
- Leakage:?μW
- Total:?
设计规模
- RTL SystemVerilog 行数:?
- Module 数:?
- Tcl 脚本行数:?
量化讲法模板
✅ “On FreePDK 45nm SS corner with 25°C 0.9V, the design closed at 500MHz with WNS +5ps. Critical path was the RegFile read → ALU → forward MUX → ALU input, totaling about 1.95ns of the 2ns budget.”
✅ “Core area is approximately 8500 μm², with the 32×32 full-custom RegFile contributing about X%, ALU about Y%, and the rest in pipeline registers and control logic.”
❌ “Frequency is 500MHz.”(没说哪个 corner、什么 voltage)
跟参考核对比(可选)
| 核 | 工艺 | 频率 | 面积 |
|---|---|---|---|
| Yours | 45nm | 500MHz | 8500 μm² |
| SiFive E20(估算) | 45nm | ~400MHz | ~5000 μm² |
| Rocket(default) | 45nm | ~600MHz | ~15000 μm² |
⚠️ 参考数据需自己核实,商用核数据通常需 license。
TODO
- 跑一遍 STA 拿原始报告
- 各模块面积分解
- 各 stage timing breakdown
- Power analysis(如做过)
- 跟参考核对比表填实数
STAR Narratives (English)
关键术语对照
| 中文 | English |
|---|---|
| 全定制 | Full-custom |
| 标准单元 | Standard cell |
| 传输门 | Transmission Gate(TG) |
| 锁存器 | Latch |
| 触发器 | Flip-Flop |
| 综合 | Synthesis |
| 布局布线 | Place and Route(P&R) |
| 静态时序分析 | Static Timing Analysis(STA) |
| 时钟树综合 | Clock Tree Synthesis(CTS) |
| 时序收敛 | Timing Closure |
| 工艺角 | Process Corner |
| 时钟偏斜 / 抖动 | Skew / Jitter |
| 关键路径 | Critical Path |
| 时序余量 | Slack(WNS / TNS) |
| 设计规则检查 | Design Rule Check(DRC) |
| 版图与原理图比对 | Layout vs Schematic(LVS) |
| X-传播 | X-propagation |
| 三态总线 | Tri-state Bus |
| 一热多路选择器 | One-hot MUX |
| 数据通路 | Datapath |
| 流水线 | Pipeline |
| 旁路 | Forwarding / Bypass |
| 多角时序分析 | Multi-corner STA |
| 约束文件 | SDC(Synopsys Design Constraints) |
30-Second Elevator Pitch
For my VLSI Design course at UIUC, I executed a complete RTL-to-GDSII flow on a
RV32I 5-stage pipelined CPU using FreePDK 45nm. I hand-crafted a full-custom 32-by-32
register file in Cadence Virtuoso using transmission-gate latches, with M1 to M3
routing and DRC-LVS clean. The RTL was written in SystemVerilog with X-propagation
checked through VCS xprop and SVA assertions. I closed timing at 500MHz with positive
5 picosecond worst negative slack, and the core fit in about 8500 square microns.
2-Minute Standard Version(STAR)
Situation
This was the VLSI Design course capstone at UIUC, where the goal was to take a CPU
from SystemVerilog RTL all the way to GDSII layout, using industry-standard tools —
DC Compiler for synthesis, Innovus for place and route, PrimeTime for sign-off STA,
and Calibre for DRC and LVS. We targeted FreePDK 45nm, which is an open-source PDK.
Task
The most challenging part for me was hand-crafting the 32-by-32 register file in
Cadence Virtuoso. I also wrote the SystemVerilog RTL for the 5-stage pipelined
datapath, dealt with X-propagation issues, and authored Tcl scripts for the SDC
constraints and multi-corner STA flow.
Action
For the register file, I used transmission-gate-based latch storage cells and laid
out the cells by hand, with M1 for cell-internal and power, M2 horizontal, and M3
vertical. The full layout passed DRC and LVS clean. For the RTL, I made two key
quality improvements: I caught X-propagation issues using VCS xprop and added SVA
assertions to monitor that critical signals never become unknown. I also replaced
all tri-state buses with one-hot MUXes for synthesis predictability and to avoid
contention. For the flow, I wrote parameterized Tcl scripts so that retargeting to a
different frequency or corner only required changing the top-level parameter.
Result
We closed timing at 500MHz on the slow corner with positive 5 picosecond worst
negative slack — meaning we met all setup constraints with a small safety margin.
The final core area was about 8500 square microns. The whole flow taught me how
each step in ASIC design influences the next, and gave me hands-on experience with
the full Synopsys and Cadence tool chain.
15-30 Minute Deep-Dive Version
大纲
- Project Scope(2 min):RTL2GDSII 全流程,工具链
- RV32I 5-Stage Datapath(5 min):
- 白板画 datapath
- forwarding / hazard
- Register File Layout(8 min):
- TG latch cell schematic + layout
- M1-M3 routing strategy
- DRC / LVS 过程
- RTL Quality(5 min):
- X-propagation:发现 → -xprop → SVA → 修复故事
- Tri-state → One-hot 改造
- Synthesis & P&R Flow(5 min):
- DC compile_ultra
- Innovus floorplan / placement / CTS / routing
- STA Closure Story(5 min):
- Critical path 是哪条
- 怎么 close 的(buffering / cell sizing / 改 RTL)
- WNS +5ps 的意义
- Tools & Lessons(2 min):
- 学到了什么
- 重做会改什么
高频英文 Q&A
Q: Walk me through how you went from SystemVerilog to GDSII.
A: (待填,讲完整 flow)
Q: Why TG-based latch instead of standard 6T SRAM?
A: (待填)
Q: What was your critical path and how did you close timing?
A: (待填)
Q: How did you handle X-propagation?
A: (待填)
Q: What’s the difference between setup and hold violations?
A: (待填)
Q: Multi-corner STA — which corners did you run and why?
A: (待填)
Q: How did you parameterize your Tcl SDC?
A: (待填)
录音 / 模拟练习清单
- 30-second pitch 录 ≥ 5 次
- 2-minute STAR 录 ≥ 5 次
- 现场默画 datapath + RegFile layout 同时英文讲解
- X-prop 故事 5 分钟英文版
- Critical path closure 故事英文版