

## Estimating Delays

- Would be nice to have a "back of the envelope" method for sizing gates for speed
- Logical Effort
- Book by Sutherland, Sproull, Harris
- Chapter 1 is on our web page
- Also Chapter 4 in our textbook



## Gate Delay Model

- First, normalize a model of delay to dimensionless units to isolate fabrication effects
- $d_{\text {abs }}=d \tau$
- $\tau$ is the delay of a minimum inverter driving another minimum inverter with no parasitics
- In a 0.6 process, this is approx 40ps
- Now we can think about delay in terms of d and scale it to whatever process we're using


## Gate Delay

- Delay of a gate d has two components
- A fixed part called parasitic delay p
- A part proportional to the load on the output called the effort delay or stage effort f
- Total delay is measured in units of $\tau$, and is sum of these delays
- $d=f+p$


## Effort Delay

- The effort delay (due to load) can be further broken down into two terms: $\mathbf{f}=\mathbf{g}$ * $\mathbf{h}$
- $g$ = logical effort which captures properties of the gate's structure
- h = electrical effort which captures properties of load and transistor sizes
- $\mathrm{h}=\mathrm{C}_{\text {out }} / \mathrm{C}_{\text {in }}$
- $\mathrm{C}_{\text {out }}$ is capacitance that loads the output
- $\mathrm{C}_{\text {in }}$ is capacitance presented at the input
- So, d = gh + p


## Logical Effort

- Logical effort normalizes the output drive capability of a gate to match a unit inverter
- How much more input capacitance does a gate need to present to offer the same drive as an inverter?

(a)(

(c)


## Computing Logical Effort

- DEF: Logical effort is the ratio of the input capacitance of a gate to the input capacitance of an inverter delivering the same output current.
- Measure from delay vs. fanout plots
- Or estimate by counting transistor widths

$C_{\text {in }}=3$
$g=3 / 3$
$\mathrm{g}=3 / 3$

$C_{\text {in }}=4$
$g=4 / 3$
$C_{\text {in }}=5$


## Logical Effort of Other Gates

- Logical effort of common gates assuming that $\mathrm{P} / \mathrm{N}$ size ratio is 2 Number of inputs

| Gate Type | $\mathbf{1}$ | $\mathbf{2}$ | $\mathbf{3}$ | $\mathbf{4}$ | $\mathbf{5}$ | $\boldsymbol{n}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Inverter | 1 |  |  |  |  |  |
| NAND |  | $4 / 3$ | $5 / 3$ | $6 / 3$ | $7 / 3$ | $(\mathrm{n}+2) / 3$ |
| NOR |  | $5 / 3$ | $7 / 3$ | $9 / 3$ | $11 / 3$ | $(2 \mathrm{n}+1) / 3$ |
| MUX | 2 | 2 | 2 | 2 | 2 |  |
| XOR | 4 | 12 | 32 |  |  |  |

## Electrical Effort

- Value of logical effort g is independent of transistor size
- It's related to the ratios and the topology
- Electrical effort h captures the drive capability of the transistors via sizing
- Electrical effort $\mathrm{h}=\mathrm{C}_{\text {out }} / \mathrm{C}_{\text {in }}$
- Note that as transistor sizes for a gate increase, h decreases because $\mathrm{C}_{\text {in }}$ goes up


## Parasitic Delay

- Parasitic delay p is caused by the internal capacitance of the gate
- It's constant and independent of transistor size
- As you increase the transistor size, you also increase the cap of the gate/source/drain areas which keeps it constant
- For our purposes, normalize $p_{\text {inv }}$ to 1
- $N$-input NAND $=n^{*} p_{\text {inv }}$
- $N$-input NOR $=n^{*} p_{\text {inv }}$
- $N$-way mux $=2 n^{*} \mathrm{p}_{\text {inv }}$
- XOR $=4^{*} \mathrm{p}_{\text {inv }}$

:



## Example: Ring Oscillator

- Estimate the frequency of an N -stage ring oscillator


Logical Effort: g = Electrical Effort: $\mathrm{h}=$ Parasitic Delay: p = Stage Delay: d= Period of osc =

## Example: Ring Oscillator

- Estimate the frequency of an N -stage ring oscillator


Logical Effort: g=1
Electrical Effort: $\mathrm{h}=1$
Parasitic Delay: p=1
Stage Delay: d=2 so dabs $=80 \mathrm{ps}$
Period: $2^{*} \mathrm{~N}^{\star} \mathrm{d}_{\mathrm{abs}}=4.96 \mathrm{~ns}$, Freq $=\sim 200 \mathrm{MHz}$

$$
\text { For } \mathrm{N}=31
$$

## Example: FO4 Inverter

- Estimate the delay of a fanout-of-4 (FO4) inverter


Logical Effort: $g=$
Electrical Effort: $\mathrm{h}=$
Parasitic Delay: $\quad \mathrm{p}=$
Stage Delay: d=

## Example: FO4 Inverter

- Estimate the delay of a fanout-of-4 (FO4) inverter


The FO4 delay is about 200 ps in $0.6 \mu \mathrm{~m}$ process 60 ps in a 180 nm process
Logical Effort: $\quad g=1$ $\mathrm{f} / 3 \mathrm{~ns}$ in an $f \mu \mathrm{~m}$ process
Electrical Effort: $\quad \mathrm{h}=4$
Parasitic Delay: $\quad \mathrm{p}=1$
Stage Delay:
$d=g h+p=5$

## Delay Estimation



- If $\mathrm{Cin}=\mathrm{x}$, Cout $=10 \mathrm{x}$, thus $\mathrm{h}=10$
- $g=9 / 3=3$
- $d=g h+p=3^{*} 10+4^{*} 1=34$ (1360 ps)


## Multi Stage Delay

## MultiStage Delay

- Recall rule of thumb that said to balance the delay at each stage along a critical path
- Concepts of logical effort and electrical effort can be generalized to multistage paths


In general, Path logic effort $\mathrm{G}=\Pi \mathrm{g}(\mathrm{i})$
Path electrical effort $\mathrm{H}=$ Cout $/ \mathrm{Cin}_{\text {first_gate }}$
Must remember that electrical effort only is concerned with effect of logic network on input drivers and output load.


Off path load will divert electrical effort from the main path, must account for this. Define a branching effort $b$ as:

$$
\mathrm{b}=(\text { Con_path }+ \text { Coff_path }) / \text { Con_path } \frac{\text { Ctotal }}{\text { Cuseful }}
$$

The branching effort will modify the electrical effort needed at that stage. The branch effort $B$ of the path is:

$$
B=\Pi \mathrm{b}(\mathrm{i})
$$

## Summary - multistage networks

- Logical effort generalizes to multistage networks
- Path Logical Effort $\quad G=\prod g_{i}$
- Path Electrical Effort $\quad H=\frac{C_{\text {out-path }}}{C_{\text {in-path }}}$
- Path Effort $\quad F=\prod f_{i}=\prod g_{i} h_{i}$
- Can we write $\mathrm{F}=\mathrm{GH}$ ?


## Branching Effort

- Remember branching effort
- Accounts for branching between stages in path

$$
b=\frac{C_{\text {on path }}+C_{\text {off path }}}{C_{\text {on path }}} \quad \begin{aligned}
& \text { Note: } \\
& \\
& h_{i}=B H
\end{aligned}
$$

- Now we compute the path effort
- F = GBH


## Multistage Delays

- Path Effort Delay
$D_{F}=\sum f_{i}$
- Path Parasitic Delay
- Path Delay

$$
D=\sum d_{i}=D_{F}+P
$$

## Designing Fast Circuits

$$
D=\sum d_{i}=D_{F}+P
$$

- Delay is smallest when each stage bears same effort

$$
\hat{f}=g_{i} h_{i}=F^{\frac{1}{N}}
$$

- Thus minimum delay of N stage path is

$$
D=N F^{\frac{1}{N}}+P
$$

- This is a key result of logical effort
- Find fastest possible delay
- Doesn't require calculating gate sizes


## Minimizing Path Delay

The absolute delay will have the parasitic delays of each stage summed together.

However, can focus on just Path effort $F$ for minimization purposes since parasitic delays are constant.

For an N -stage network, the path delay is least when each stage in the path bears the same stage effort.

$$
\mathrm{f}(\mathrm{~min})=\mathrm{g}(\mathrm{i}) * \mathrm{~h}(\mathrm{i})=\mathrm{F}^{1 / \mathrm{N}}
$$

Minimum achievable path delay

$$
\mathrm{D}(\min )=\mathrm{N} * \mathrm{~F}^{1 / \mathrm{N}}+\mathrm{P}
$$

Note that if $\mathrm{N}=1$, then $\mathrm{d}=\mathrm{f}+\mathrm{p}$, the original single gate equation.

## Choosing Transistor Sizes

Remember that the stage effort $h(i)$ is related to transistor sizes.
$\mathrm{f}(\min )=\mathrm{g}(\mathrm{i}) * \mathrm{~h}(\mathrm{i})=\mathrm{F}^{1 / \mathbf{N}}$

So

$$
\mathrm{h}(\mathrm{i}) \min =\mathrm{F}^{1 / \mathbf{N}} / \mathrm{g}(\mathrm{i})
$$

To size transistors, start at end of path, and compute:

$$
\operatorname{Cin}(i)=g i * \operatorname{Cout}(i) / f(\min )
$$

Once Cin(i) is know, can distribute this among transistors of that stage.


## Example, continued

The effort of each stage will be:

$$
\mathrm{f} \min =\left(\mathrm{G}^{*} \mathrm{~B}^{*} \mathrm{H}\right)^{1 / 3}=(2.37 * 1.0 * 1.0)^{1 / 3}=1.33=4 / 3
$$

Cin of last gate should equal: $\quad \mathrm{f}(\mathrm{min})=\mathrm{gi} * \mathrm{bi} * \mathrm{hi}$
Cin last gate $(\min )=\mathrm{gi} *$ Cout $(\mathrm{i}) / \mathrm{f}(\mathrm{min})$

$$
=4 / 3 * \mathrm{C} /(4 / 3)=\mathrm{C}
$$

Cin of middle gate should equal:
Cin middle gate $=\mathrm{gi} *$ Cin last gate $/ \mathrm{f}(\mathrm{min})$

$$
=4 / 3 * \mathrm{C} /(4 / 3)=\mathrm{C}
$$

All gates have same input capacitance, distribute it among transistors.

## Transistor Sizes for Example



Where gate capacitance of $2 * \mathrm{~W} * \mathrm{~L}$ Mosfet $=\mathrm{C} / 2$

Choose W accordingly.

## Another Example, Larger Load



Size the transistors of the nand2 gates for the three stages shown.

Path logic effort $=\mathrm{G}=\mathrm{g} 0 * \mathrm{~g} 1 * \mathrm{~g} 2=4 / 3 * 4 / 3 * 4 / 3=2.37$
Branching effort $\mathrm{B}=1.0$ (no off-path load)
Electrical effort $\mathrm{H}=$ Cout $/ \mathrm{Cin}=8 \mathrm{C} / \mathrm{C}=\mathbf{8 . 0}$
Min delay achievable $=3 *\left(\mathrm{G}^{*} \mathrm{~B}^{*} \mathrm{H}\right)^{1 / 3}+3\left(2^{*}\right.$ pinv $)$

$$
=3 *(2.37 * 1 * 8)^{1 / 3}+3(2 * 1.0)=14.0
$$

## 8C Load Example Cont.

The effort of each stage will be:

$$
\mathrm{f} \min =(\mathrm{G} * \mathrm{~B} * \mathrm{H})^{1 / 3}=(2.37 * 1.0 * 8)^{1 / 3}=2.67=8 / 3
$$

Cin of last gate should equal:

$$
\begin{aligned}
\text { Cin last gate }(\mathrm{min}) & =\mathrm{gi} * \operatorname{Cout}(\mathrm{i}) / \mathrm{f}(\min ) \\
& =4 / 3 * 8 \mathrm{C} /(8 / 3)=4 \mathrm{C}
\end{aligned}
$$

Cin of middle gate should equal:
Cin middle gate $=\mathrm{gi} *$ Cin last gate $/ \mathrm{f}(\mathrm{min})$

$$
=4 / 3 * 4 \mathrm{C} /(8 / 3)=2 \mathrm{C}
$$

Note that each stage gets progressively larger, as is typical with a multi-stage path driving a large load.


## Example 1.6 Continued

Stage effort of each stage should be:
$f(\mathrm{~min})=(\mathrm{F})^{1 / \mathrm{N}}=(\mathrm{GBH})^{1 / \mathrm{N}}=(64)^{1 / 3}=4$
Determine Cin of last stage: $\mathrm{f}(\mathrm{min})=\mathrm{gi} * \mathrm{bi} * \mathrm{hi}$
$\operatorname{Cin}(\mathrm{z})=\mathrm{g} *$ Cout $/ \mathrm{f}(\mathrm{min})=4 / 3 * 4.5 \mathrm{C} / 4=1.5 \mathrm{C}$
Determine Cin of middle stage:
$\operatorname{Cin}(\mathrm{y})=\mathrm{g} *(3 * \operatorname{Cin}(\mathrm{z})) / \mathrm{f}(\min )=4 / 3 *(3 * 1.5 \mathrm{C}) / 4=1.5 \mathrm{C}$
Is first stage correct?
$\operatorname{Cin}(\mathrm{A})=\mathrm{g} *(2 * \operatorname{Cin}(\mathrm{y})) / \mathrm{f}(\mathrm{min})=4 / 3 *(2 * 1.5 \mathrm{C}) / 4=\mathrm{C}$.
Yes, self-consistent.

## Example: 3-stage path

- Select gate sizes $x$ and $y$ for least delay from A to B




## Example: 3-stage path



Logical Effort
Electrical Effort
Branching Effort
Path Effort
Best Stage Effort
Parasitic Delay
Delay
$G=(4 / 3)^{*}(5 / 3)^{*}(5 / 3)=100 / 27$
H = 45/8
$B=3$ * $2=6$
$\mathrm{F}=\mathrm{GBH}=125$
$\hat{f}=\sqrt[3]{F}=5$
$\mathrm{P}=2+3+2=7$
$\mathrm{D}=3 * 5+7=22=4.4 \mathrm{FO} 4$

## Example: 3-stage path

- Work backward for sizes
$y=$
$\mathrm{x}=$



## Example: 3-stage path

- Work backward for sizes $\mathrm{f}(\mathrm{min})=\mathrm{gi} * \mathrm{bi} *$ hi

$$
y=45^{*}(5 / 3) / 5=15 \quad\left(g_{i}{ }^{*} \mathrm{~b}_{\mathrm{i}}{ }^{*} \mathrm{C}_{\text {out }}\right) / f_{\text {min }}=C_{\text {in }}
$$

$$
x=\left(15^{*} 2\right) *(5 / 3) / 5=10
$$



1:1 ratio $\quad 2: 3$ ratio
4:1 ratio


Path logic effort $\mathrm{G}=\mathrm{g} 0 * \mathrm{~g} 1 * \mathrm{~g} 2 * \mathrm{~g} 3=1 * 5 / 3 * 4 / 3 * 1=20 / 9$
Path Branch effort $\mathrm{B}=1$
Path electrical effort $\mathrm{H}=$ Cout $/ \mathrm{Cin}=20 / 10=2$
Path stage effort $=\mathrm{F}=\mathrm{G} * \mathrm{~B} * \mathrm{H}=(20 / 9) * 1 * 2=40 / 9$
For Min delay, each stage has effort $(F)^{1 / \mathrm{N}}=(40 / 9)^{1 / 4}=1.45$
$\mathrm{z}=\mathrm{g} * \operatorname{Cout} / \mathrm{f}(\mathrm{min})=1 * 20 / 1.45=14 \quad\left(\mathrm{~g}_{\mathrm{i}} \mathrm{b}_{\mathrm{i}}{ }^{*} \mathrm{C}_{\text {out }}\right) / \mathrm{f}_{\text {min }}=\mathrm{C}_{\text {in }}$
$y=g * \operatorname{Cin}(z) / f(\min )=4 / 3 * 14 / 1.45=13$
$\mathrm{x}=\mathrm{g} * \operatorname{Cin}(\mathrm{y}) / \mathrm{f}(\mathrm{min})=5 / 3 * 13 / 1.45=15$
Note: Don't care about parasitics for gate sizing, only if you want to know absolute delay...

## Misc. Comments

- Note that you never size the first gate
- This gate is assumed to be fixed
- If you were allowed to size it, the algorithm would try to make it as large as possible
- This is an estimation algorithm
- Authors claim that sizing a gate by $1.5 x$ too big or small still results in a path delay within $15 \%$ of minimum


## Sensitivity Analysis

- How sensitive is delay to using exactly the best number of stages?

- $2.4<\rho<6$ gives delay within $15 \%$ of optimal
- We can be sloppy!
- l like $\rho=4$


## Evaluating Different Options



## Option \#1



Path logic effort $\mathrm{G}=\mathrm{g} 0 * \mathrm{~g} 1 * \mathrm{~g} 2=1 * 6 / 3 * 1=2$
Path Branch effort B=1
Path electrical effort $\mathrm{H}=$ Cout $/ \mathrm{Cin}=8 \mathrm{C} / \mathrm{C}=8$
Path stage effort $=\mathrm{F}=\mathrm{G} * \mathrm{~B} * \mathrm{H}=2 * 1 * 8=16$
Min delay: $=N^{*}(F)^{1 / \mathrm{N}}+\mathrm{P}$
$=3 *(16)^{1 / 3}+\left(\right.$ pinv $+4^{*}$ pinv + pinv $)$
$=3 *(2.5)+6=13.5$


## How many stages?

- Consider three alternatives for driving a load 25 times the input capacitance
- One inverter
- Three inverters in series
- Five inverters in series
- They all do the job, but which one is fastest?


## How many stages?

- In all cases: $\mathrm{G}=1, \mathrm{~B}=1$, and $\mathrm{H}=25$
- Path delay is $N(25)^{1 / N}+N P_{\text {inv }}$
- $N=1, D=26$ units
- $N=3, D=11.8$ units
- $N=5, D=14.5$ units
- Since $N=3$ is best, each stage will bear an effort of $(25)^{1 / 3}=2.9$
- So, each stage is $\sim 3 x$ larger than the last
- In general, the best stage effort is between 3 and 4 (not e as often stated)
- The e value doesn't use parasitics...


## Choosing the Best \# of Stages

- You can solve the delay equations to determine the number of stages N that will achieve the minimum delay
- Approximate by $\mathrm{Log}_{4} \mathrm{~F}$

| $\begin{array}{c}\text { Path Effort } \\ \boldsymbol{F}\end{array}$ | $\begin{array}{c}\text { Sest } \\ \boldsymbol{N}\end{array}$ | Min Delay |
| :--- | :--- | :--- | :--- |
| $\boldsymbol{D}$ |  |  | \(\left.\begin{array}{c}Stage effort <br>

\boldsymbol{f}\end{array}\right]\)

## Example

- String of inverters driving an off-chip load
- Pad cap and load = 40pf
- Equivalent to 20,000 microns of gate cap
- Assume first inverter in chain has 7.2 u of input cap
- How many stages in inv chain?
- $\mathrm{H}=20,000 / 7.2$ = 2777
- From the table, 6 stages is best
- Stage effort $=f=(2777)^{1 / 6}=3.75$
- Path delay D $=6 * 3.75+6 *$ Pinv $=28.5$
- $D=1.14 \mathrm{~ns}$ if $\tau=40 \mathrm{ps}$

| Other N's? |
| :---: |
| - $\mathrm{N}=2: \mathrm{f}=(2777)^{1 / 2}=52.7$ $\quad$ delay $=2(52.7)+2=158.1=6.324 \mathrm{~ns}$ |
| $\begin{aligned} & \mathrm{N}=3: \mathrm{f}=(2777)^{1 / 3}=14 \\ & \text {. delay }=3(14)+3=45=1.8 \mathrm{~ns} \end{aligned}$ |
| $\begin{aligned} & \mathrm{N}=4: \mathrm{f}=(2777)^{1 / 4}=7.26 \\ & \text {. delay }=4(7.26)+4=33.04=1.32 \mathrm{~ns} \end{aligned}$ |
| $\begin{aligned} & \mathrm{N}=5: \mathrm{f}=(2777)^{1 / 5}=4.88 \\ & \quad \text { - delay }=5(4.88)+5=29.4=1.18 \mathrm{~ns} \end{aligned}$ |
| - $\mathrm{N}=6$ : delay $=1.14 \mathrm{~ns}$ |
| $\begin{aligned} & \mathrm{N}=7: \mathrm{f}=(2777)^{1 / 7}=3.105 \\ & \quad \text { - delay }=7(3.105)+7=28.7=1.15 \mathrm{~ns} \end{aligned}$ |

## Summary

- Compute path effort F = GBH
- Use table, or estimate $\mathrm{N}=\log _{4} \mathrm{~F}$ to decide on number of stages
- Estimate minimum possible delay
$\mathrm{D}=\mathrm{NF}^{1 / \mathrm{N}}+\Sigma \mathrm{p}_{\mathrm{i}}$
- Add or remove stages in your logic to get close to N
- Compute effort at each stage $f_{\text {min }}=F^{1 / N}$
- Starting at output, work backwards to compute transistor sizes $C_{i n}=\left(g_{i}{ }^{*} b_{i}{ }^{*} C_{\text {out }}\right) / f_{\text {min }}$


## Limits of Logical Effort

- Chicken and egg problem
- Need path to compute G
- But don't know number of stages without G
- Simplistic delay model
- Neglects input rise time effects
- Interconnect
- Iteration required in designs with wire
- Maximum speed only
- Not minimum area/power for constrained delay


## Summary

- Logical effort is useful for thinking of delay in circuits
- Numeric logical effort characterizes gates
- NANDs are faster than NORs in CMOS
- Paths are fastest when effort delays are $\sim 4$
- Path delay is weakly sensitive to stages, sizes
- But using fewer stages doesn't mean faster paths
- Delay of path is about $\log _{4}$ F FO4 inverter delays
- Inverters and NAND2 best for driving large caps
- Provides language for discussing fast circuits
- But requires practice to master

