# Low Swing Signaling Using a Dynamic Diode-Connected Driver

Marcos Ferretti, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California Los Angles, CA – USA ferretti@usc.edu, pabeerel@usc.edu

#### Abstract

In this paper, we propose a novel low swing driver using a Dynamic Diode-Connected Driver (DDCD) architecture. The receiver can be a simple inverter since the line swing is around Vdd/2 (from Vtn to Vdd-/Vtp/). The simulation results shows a reduction of the energy-delay product between 27% and 54% when compared with the full swing CMOS buffer for a 0.5mm and 0.18 mm process. Unlike most alternatives, no extra power supplies, nor a multi-threshold process, are required.

# 1. Introduction

As technology scales down, on-chip wires become increasingly important compared with devices in terms of power, delay and density. Comparing with the scaling devices, the delay of the wire increases 71% per year for cross-chip wires [2].

Low swing drive of long wires is one common technique studied to reduce the energy-delay required to propagate information on these wires. Different low swing drive circuit topologies have been proposed [6-10]. However, most fall short of fully satisfying all the following desirable characteristics:

- No extra power supplies. The requirement of some circuits [1][6] for internal (or external) intermediate level power supplies, which may not be readily available in many applications, complicates the physical design and thus adds risk.
- No extra reference voltages. Voltage references used in [1][8] do not need to supply power but offer the same problems as above.
- No multiple threshold voltage (*Vt*) process. Used in some circuits [1][6], multi-Vt process may limit the designer foundry options and complicate process portability.
- Voltage scalability. The circuit should operate properly within a good range of dynamic and static voltage scaling.
- Low short-circuit current. Big buffer drivers may cause significant short-circuit current during voltage transitions. Ideally, a low-swing driver should avoid this.

- **Low power.** Power consumption, under any variation of process and power supply, should be smaller than the full swing buffer counterpart.
- **Low propagation delay.** Propagation delay should be close or better than the full swing buffer with the same output driver transistor sizes.
- Good noise margin. The driver-receiver pair must have reasonable noise margin. Since the signal swing is reduced, the noise margin is reduced unless a differential (or pseudo-differential) approach is used [1], but they add extra wires and/or extra power supplies and voltage references.
- Small area penalty. Compared with the conventional full swing buffer, the required extra area should be small.
- **Single-wire interconnect.** Some two-wire architectures yield very good power and performance [1], but double the number of wires in a data bus may increases the area significantly.

The proposed circuit is a good compromise among all these goals. It can be used to replace a full swing buffer without major changes in the design.

This paper describes in Section 2 the test architecture, the process used in the simulations and the basic energy and noise analysis, Section 3 describes the proposed driver/receiver pair, Section 4 shows the simulation results and comparisons and Section 5 presents some conclusions.

## 2. Test architecture

Figure 1 shows the test architecture used in [1] and [7] that we adopt.

This paper uses two process parameters and spice models: HP 0.5  $\mu$ m AMOS 14TB and TSMC 0.18  $\mu$ m, both from MOSIS. The HP 0.5  $\mu$ m process allowed us to compare our results with previous benchmarks [1]. We use a  $\pi$ 3 interconnect line model [4] for simulations in this process with CL = 1pF, Cw = 1pF and  $Rw = 300\Omega$ . CL is the load capacitance distributed along the wire (for fanout), Cw is the wire capacitance and Rw the wire resistance. The TSMC 0.18  $\mu$ m process is used to check the performance of the proposed low-swing architecture in a deep sub-micron process. We use a  $\pi$ 3 interconnect model [4] for simulations in this process with CL = 1pF, Cw = 0.7pF and  $Rw = 2800\Omega$ .

In both cases, we compare our circuit with a conventional buffer implemented with two inverters in the same technology, driving equal lines and with identical receivers.



Figure 1 - (a) Test architecture, (b)  $\pi$ 3 line model

Equation 1 below gives the dynamic switching energy required to drive the line with low swing ( $E_{low}$ ).

$$E_{low} = Ctot. Vdd. Vs \tag{1}$$

where, *Ctot* is the total capacitance driven (CL + Cw), *Vdd* is the driver power supply voltage and *Vs* is voltage swing applied over the line.

Since, for the conventional full swing CMOS buffer, *Vs* is equal to *Vdd*, we have:

$$E_{full} = Ctot.Vdd^2$$
(2)

The energy and delay performances are investigated through simulations, and the reliability due to process variations, voltage supply noise and interline crosstalk is estimated using the worst case method presented in [2] and also used in [1]. Table 1 shows the formulas and parameters used in [1] for the HP 0.5  $\mu$ m process and estimated for the TSMC 0.18  $\mu$ m process.

Table 1. Noise sources analysis

| Parameter                                                    | Parameter Definition                                   |  |  |  |  |
|--------------------------------------------------------------|--------------------------------------------------------|--|--|--|--|
| V                                                            | Crosstalk coupling coefficient for a 10 mm wire with   |  |  |  |  |
| $\mathbf{\Lambda}_C$                                         | $CL = 1$ pF and 2 $\mu$ m spacing                      |  |  |  |  |
| Attn <sub>c</sub> Static driver crosstalk noise attenuation. |                                                        |  |  |  |  |
| V                                                            | Power supply noise due to signal switching for single- |  |  |  |  |
| $\kappa_{PS}$                                                | wire signaling 5% [1].                                 |  |  |  |  |
|                                                              | Worst case: $K_N = Attn_C K_C + K_{PS}$                |  |  |  |  |
| Rx_O                                                         | Inverter input offset                                  |  |  |  |  |
| Rx_S                                                         | Inverter sensitivity                                   |  |  |  |  |
| PS                                                           | Power supply noise (5%) [1]                            |  |  |  |  |
| Attn <sub>PS</sub>                                           | Power supply noise attenuation                         |  |  |  |  |
| Tx_O                                                         | Transmitter offset                                     |  |  |  |  |
| Wo                                                           | orst case: $V_{IN} = Rx_O + Rx_S + Attn_{PS} + Tx_O$   |  |  |  |  |

The total noise introduced in the line  $(V_N)$  is estimated as follows:

$$V_N = K_N V_S + V_{IN} \tag{3}$$

where,  $K_N V_S$  accounts for the noise that is proportional to the signal amplitude, such as crosstalk and induced power supply noise, and  $V_{IN}$  represents the noise sources that are independent of the signal magnitude like the transmitter and receiver offsets and unrelated power supply noise. The signal-to-noise ratio (*SNR*) is then:

$$SNR = \frac{0.5.V_s}{V_N} \tag{4}$$

#### 3. Proposed driver-receiver pair

To avoid using external power supplies or reference voltages, we choose to limit the voltage swing, *Vs*, as follows:

$$-Vtn \le Vs \le (Vdd - | -Vtp|) \tag{5}$$

where,  $\sim Vtn$  and  $\sim Vtp$  are, approximately, the NMOS and the PMOS transistor threshold voltage respectively and Vdd is the supply voltage.

The maximum energy-savings ratio is then given by:

$$\frac{E_{low}}{E_{full}} = \frac{Vs}{Vdd} \cong \frac{Vdd - |Vtp| - Vtn}{Vdd}$$
(6)

This is not the optimal energy-saving swing [5], but enables a good compromise between energy, delay, reliability and complexity.

#### 3.1. The driver circuit

In order to limit the voltage swing, some circuits used intermediate power supplies [1][6], disable the output driver transistors when some voltage level is reached [8][10] or used source follower configurations [1][7][9]. Disabling the output driver transistors may decrease the noise immunity even when some form of feedback is used to turn the transistor back on if the voltage on the line drifts. Source followers, due to the body effect, are not very efficient drivers, as shown in Figure 2, and may require extra output transistors [7].

In our driver, shown in Figure 3, the driving output transistor switches among three different modes: First, it is fully active, providing high drive capacity to quickly charge/discharge the line. Then, the driving transistor becomes "diode-connected" [3], limiting the line's voltage swing and offering lower impedance then the source follower to better fight noise. The transistor finally turns off when the line is driven in the opposite direction. Figure 4 shows the typical waveforms of the Dynamic Diode-Connected Driver (DDCD).

For a deep sub-micron process, the resistivity of the line is significant and over-driving the line (actively drive the line beyond the low swing limits) helps to decrease the propagation delay [7]. In our proposed circuit, the amount over-drive is controlled by proper transistor sizing. Moreover, unlike the circuits proposed in [7][9], our driver consists of only one transistor in series, providing higher drive for the same area. Also, if the line has long periods of inactivity, voltage level guards [10] can be used to guarantee the same performance for all transitions.



Figure 2 - Typical transistor output impedance.



Figure 3 - Dynamic Diode-Connected Driver.

Initially, assume the input is high. The transistors M3, M4 and M6 are on and M1 (the N driver), M2, M5 and M7 are off (M1 off mode). At the input transition from high to low, M4, M3 and the P driver (M8) are turned off, while the gate of the N driver (M1) is charged, through M5-M6, fully activating the output transistor (active mode). Then, as the line is driven towards ground, M7, now active, turns M6 off and enables M2 to turn on. At this moment, the gate of the N driver (M1) "holds" the charge while the line is discharging but not yet low enough to activate M2. When M2 is active, the voltage at the gate of M1 is driven to match the line ("diode-connected" mode). At an input transition from low to high, the same sequence is applied to the P driver (M8) side.



Figure 4 – Driver typical waveforms.

#### 3.2. The receiver circuit

The receiver circuit selected was a simple inverter with an enable signal. According to [5], a CMOS inverter is probably the fastest possible amplifier in a given technology.

Also, since we are driving the line crossing Vdd/2 on every transition, a balanced inverter may present a good receiver in terms of simplicity and performance.

Others receivers structures, like the level converter [1], may be used. They may offer a better noise margin, but they are not as fast a single inverter.

Since the transmitter and the receiver transistors are far apart, transistor mismatch is likely to occur and the final voltage level of the line may allow both of the receiver transistors to conduct. The enable signal may be used to "turn-off" the receiver to avoid any possible bias current while the line is not being used.

#### 4. Analysis and simulation results

We compare a conventional CMOS buffer, implemented with two inverters, with the DDCD circuit with the same output driver transistor sizes. For the HP 0.5  $\mu$ m process, the input, the output and the receiver inverters are implemented with channel width *Wn*=3  $\mu$ m and *Wp* = 6  $\mu$ m and *Cout* = 20fF. For the TSMC 0.18  $\mu$ m, the input, the output and the receiver inverter are implemented with width *Wn*=0.54  $\mu$ m and *Wp* = 2.16  $\mu$ m and *Cout* = 3fF. The energy and delay are measured over all circuits in the test structure.

Table 2. HP 0.5 µm: CMOS versus DDCD

| HP 0.5um     | Vdd =     | 1.5  | 2    | 2.6  | 3.3  | V     |
|--------------|-----------|------|------|------|------|-------|
|              | E-CMOS    | 5.6  | 10   | 17.2 | 28.0 | pJ    |
| Energy       | E-DDCD    | 3.0  | 5.6  | 11.4 | 21.8 | pJ    |
|              | E_ratio   | 46%  | 44%  | 34%  | 22%  |       |
|              | D-CMOS    | 3.15 | 1.90 | 1.43 | 1.22 | ns    |
| Delay        | D-DDCD    | 3.90 | 1.80 | 1.33 | 1.15 | ns    |
|              | D_ratio   | -24% | 5%   | 7%   | 6%   |       |
|              | E*D-CMOS  | 17.6 | 19   | 24.5 | 34.1 | pJ*ns |
| Energy*delay | E*D-DDCD  | 11.7 | 10.0 | 15.1 | 25.0 | pJ*ns |
|              | E*D_ratio | 34%  | 47%  | 38%  | 27%  | _     |
| Vs           | CMOS      | 1.5  | 2    | 2.6  | 3.3  | V     |
|              | DDCD      | 0.65 | 0.88 | 1.42 | 2.12 | V     |
| (Ideal)      | E_ratio   | 43%  | 44%  | 55%  | 64%  |       |

| n: CMOS versus DDCD |
|---------------------|
|                     |

| TSMC<br>0.18um | Vdd =                             | 1.2                 | 1.4                 | 1.6                 | 1.8                | V              |
|----------------|-----------------------------------|---------------------|---------------------|---------------------|--------------------|----------------|
| Energy         | E-CMOS<br>E-DDCD<br>E_ratio       | 2.4<br>1.6<br>35%   | 3.4<br>2.0<br>41%   | 4.4<br>2.8<br>36%   | 5.6<br>4.0<br>29%  | pJ<br>pJ       |
| Delay          | D-CMOS<br>D-DDCD<br>D_ratio       | 3.20<br>2.66<br>17% | 2.93<br>2.27<br>23% | 2.80<br>2.12<br>24% | 2.7<br>2.07<br>23% | ns<br>ns       |
| Energy*delay   | E*D-CMOS<br>E*D-DDCD<br>E*D_ratio | 7.7<br>4.1<br>46%   | 10.0<br>4.5<br>54%  | 12.3<br>5.9<br>52%  | 15.1<br>8.3<br>45% | pJ*ns<br>pJ*ns |
| Vs<br>(Ideal)  | CMOS<br>DDCD<br>E_ratio           | 1.2<br>0.712<br>59% | 1.4<br>0.768<br>55% | 1.6<br>0.864<br>54% | 1.8<br>1.03<br>57% | V<br>V         |

|                          |      | Full swing CMOS buffer |       |         | Low swing DDCD buffer |       |         |  |
|--------------------------|------|------------------------|-------|---------|-----------------------|-------|---------|--|
|                          | CL   | Energy                 | Delay | E*D     | Energy                | Delay | E*D     |  |
|                          | (pF) | (pJ)                   | (ns)  | (pj*ns) | (pJ)                  | (ns)  | (pj*ns) |  |
|                          | 0    | 6.2                    | 1.49  | 9.2     | 4.3                   | 1.46  | 6.2     |  |
| БŚ                       | 1    | 10.0                   | 1.91  | 19.1    | 5.6                   | 1.81  | 10.1    |  |
| HP 0.5 $\mu$ t (Vdd = 2V | 2    | 13.8                   | 2.28  | 31.5    | 6.8                   | 2.14  | 14.6    |  |
|                          | 3    | 17.6                   | 2.65  | 46.6    | 8.2                   | 2.46  | 20.1    |  |
|                          | 4    | 21.4                   | 3.02  | 64.5    | 9.5                   | 2.77  | 26.3    |  |
|                          | 5    | 25.2                   | 3.36  | 84.7    | 10.8                  | 3.09  | 33.4    |  |
| .18µm<br>1.8V)           | 0    | 2.6                    | 1.23  | 3.1     | 2.3                   | 1.16  | 2.6     |  |
|                          | 1    | 5.6                    | 3.20  | 17.9    | 4.0                   | 2.66  | 10.6    |  |
|                          | 2    | 8.9                    | 4.03  | 35.9    | 6.5                   | 3.05  | 19.8    |  |
| 0                        | 3    | 12.1                   | 5.40  | 65.4    | 8.9                   | 4.29  | 38.1    |  |
| M(d)                     | 4    | 15.3                   | 6.75  | 103.5   | 11.2                  | 5.88  | 65.9    |  |
| $TS_{(i)}$               | 5    | 18.6                   | 8.17  | 151.9   | 13.4                  | 7.75  | 103.9   |  |

Table 4. Varying CL

Table 2 and Table 3 compare the performance of the DDCD and a CMOS buffer as a function of supply voltage for the HP 0.5 µm process and TSMC 0.18 µm, respectively. The maximum energy\*delay savings ratio of the DDCD is 47% at 2V for HPCMOS 0.5 µm process and 54% at 1.4V for the TSMC 0.18  $\mu m$ process. This is comparable to the best single-wire drivers proposed in [1]. In fact, our circuit has significantly higher delay savings than all the proposed circuits in [1]. The non-linear behaviour of the energy and delay ratios with respect to Vdd is mainly because, when Vdd is low, M9 and M2 may take longer to activate (to have enough Vgs), allowing the drivers to stay active longer, increasing the voltage swing despite the reduction of Vdd. Table 4 shows the robustness of the DDCD with respect to varying the load (CL) for the same transistor sizes and Vdd. The key advantage of the DDCD-inverter pair is that, unlike others, it has significantly lower design complexity, requiring no extra reference or power supply voltages.

| Process:        | TSMC 0.18 µm |       | HP 0.5 μm |       | Units |
|-----------------|--------------|-------|-----------|-------|-------|
| Schemes:        | CMOS         | DDCD  | CMOS      | DDCD  |       |
| Vdd             | 1.8          | 1.8   | 2.0       | 2.0   | V     |
| $V_S$           | 1.8          | 1.03  | 2.0       | 0.88  | V     |
| $K_C$           | 0.18         | 0.18  | 0.4       | 0.4   | -     |
| $Attn_C$        | 0.2          | 0.2   | 0.2       | 0.2   | -     |
| $K_{PS}$        | 0.05         | 0.05  | 0.05      | 0.05  | -     |
| $K_N$           | 0.09         | 0.09  | 0.13      | 0.13  | -     |
| $K_N \cdot V_S$ | 0.155        | 0.089 | 0.260     | 0.114 | V     |
| Rx_O            | 0.177        | 0.177 | 0.150     | 0.150 | V     |
| Rx_S            | 0.100        | 0.100 | 0.150     | 0.150 | V     |
| PS              | 0.09         | 0.09  | 0.10      | 0.10  | V     |
| $Attn_{PS}$     | 0.54         | 0.54  | 0.61      | 0.61  | -     |
| $Tx_O$          | 0            | 0.02  | 0         | 0.01  | V     |
| V <sub>IN</sub> | 0.326        | 0.326 | 0.361     | 0.372 | V     |
| $V_N$           | 0.480        | 0.434 | 0.621     | 0.486 | V     |
| SNR             | 1.87         | 1.19  | 1.61      | 0.90  | -     |

Table 5. Worst case noise analysis

The cost of this performance is a slightly lower *SNR* than most of the circuits proposed in [1]. As we can see in Table 5, most of this *SNR* penalty is due to the fact that the swing (*Vs*) is small and the independent noise voltage ( $V_{IN}$ ) is dominates. However,  $V_{IN}$  can be

significantly reduced by careful power distribution, device matching and, if necessary, selecting another receiver [2], like the level converter (LC) receiver [1], at expense of some extra delay. In addition, to further improve the noise margin, cross-talk from neighboring full swing signals can be reduced by either shielding or more conservative spacing rules [2].

# 5. Conclusion

The proposed DDCD circuit, with a simple inverter as a receiver, meets the desired goals of a lowcomplexity single-wire low-swing driver. It requires no extra power supplies, no reference voltages, no multiple Vt process, it scales well with voltage, and provides low power and low propagation delay with a manageable noise margin and a small area penalty.

## 6. Acknowledgements

We would like to thank Joong-Seok (Jay) Moon of USC for insightful feedback on this manuscript.

#### 7. References

[1] H. Zhang *et al.*, "Low-Swing On-Chip Signaling Techniques: Effectiveness and Robustness", IEEE Trans. on VLSI Syst., vol. 8.3, pp. 264-272, June 2000.

[2] W. J. Dally and J. W. Poulton, *Digital Systems Engineering*, U.K., Cambridge University Press, 1998.

[3] D.A. Johns and K. Martin, *Analog Integrated Circuit Design*, USA, John Wiley & Sons, Inc., 1997.

[4] J. M. Rabaey, *Digital Integrated Circuits*, New Jersey, USA, Prentice-Hall Inc., 1996.

[5] C. Svensson, "Optimum voltage swing on onchip and off-chip interconnects", European Solid State Circuits Conference ESSCIRC, Stockholm, Sweden, September 2000.

[6] Y. Nakagome *et al.*, "Sub-1-V Swing Internal Bus Architecture for Future Low-Power ULSI's", IEEE Journal of Solid State Circuits, vol. 28, no.4, Apr. 1993.

[7] C. Kwon *et al.*, "High Speed and Low Swing Interface Circuits Using Dynamic Over-Drive and Adaptive Sensing Scheme", pp. 388 – 391, International Conference on VLSI and CAD, ICVC 1999.

[8] R. Golshan and B. Haron, "A novel reduced swing CMOS BUS interface circuit for high speed low power VLSI systems", Proc. IEEE Int. Symp. on Circuits and Systems, vol. 4, pp. 351 – 354, May 1994.

[9] A. Rjoub and O. Koufopavlou, "Efficient drivers, receivers and repeaters for low power CMOS bus archtectures", The 6<sup>th</sup> IEEE International Conference on Electronic Circuits and Systems ICECS 1999.

[10] M. Karisson *et al.*, "Novel low-swing bus drivers and charge recycle architectures", pp. 141 – 150, Workshop on Design and Implementation of Signal Processing Systems, IEEE 1997.