Extended TSPC structures with double input/output data throughput for ...

1 downloads 0 Views 475KB Size Report
data paths, allow circuits to handle data with rates that are twice the clock rate. Examples of circuits employing such structures are shortly reported and to ...
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002

301

Extended TSPC Structures With Double Input/Output Data Throughput for Gigahertz CMOS Circuit Design João Navarro, S., Jr., and Wilhelmus A. M. Van Noije

Abstract—New structures to be applied with the extended truesingle-phase-clock (E-TSPC) CMOS circuit technique, an extension of the traditional true-single-phase-clock (TSPC) [1], [2], are presented. These structures, formed by the connection of proper data paths, allow circuits to handle data with rates that are twice the clock rate. Examples of circuits employing such structures are shortly reported and to illustrate more complex applications, the design of a dual-modulus prescaler (divide by 128/129) in a 0.8 m CMOS process is fully depicted. This prescaler, according to simulations, reaches a maximum 2.19-GHz operation rate at 5 V with a 46 mW power consumption. This new approach is also compared with a previous design (implemented with the E-TSPC technique and attaining a 1.59 GHz operation rate) and with other recently published circuits. Index Terms—CMOS, digital high-speed design, dual-modulus prescaler, low power, true-single-phase-clock (TSPC).

I. INTRODUCTION

F

ROM the early days of CMOS technology up to the present, several clock policies have been proposed for the implementation of CMOS circuits. The number of clock phases—a major clock feature—has suffered several changes. The pseudo two-phase logic was one of the earliest techniques proposed [3]; later on, two-phase logic structures were introduced and advanced. The domino technique [4], which successfully associated two-phase circuits and dynamic gates, and the NORA technique [5], an extensive no race approach for two-phase and dynamic circuits, are landmarks of this advance. The first single-phase clock policy was only introduced in the late 1980s, called the true single-phase-clock (TSPC) [6]. Single-phase clock policies offer superior characteristics, since their usage simplifies the clock distribution on the chip and reduces the transistor number. Thus, higher frequencies and simple designs can be achieved. In the 1990s, several new TSPC features were proposed [7], and among them a comprehensive extension of the TSPC [1], the extended true-single-phase-clock CMOS circuit technique (E-TSPC); consisting of composition rules for single-phase circuits using complementary static, dynamic, latch, data precharged [7], and NMOS like blocks (ratioed logic blocks) [1], [2]. The main purpose of this paper is the introduction of new structures in the E-TSPC technique to build circuits handling Manuscript received August 4, 2000. This work was supported in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and in part by the Fundação de Amparo á Pesquisa do Estado de São Paulo, Brazil. The authors are with the Department of Electronic Systems, EPUSP, University of São Paulo, São Paulo, Brazil (e-mail: [email protected]; [email protected]). Publisher Item Identifier S 1063-8210(02)03194-3.

data with rates that are twice the clock rate. These structures are formed by the connection of certain n and p data-chains, leading to lower-power consumption or higher speed (or both) circuits. Further, the design of a dual-modulus prescaler (divide by 128/129) with the proposed structures in a standard 0.8 m CMOS process (0.7 m effective channel length) is detailed, and the simulation results are compared with a previous E-TSCP implementation and with other recently published prescalers. The prescaler implementation aims to evaluate the potentialities of the proposed new structures. This paper is organized as follows. In Section II, the E-TSPC technique is concisely reviewed, and then, in Section III, the new proposed structures are presented with some configuration examples. In Section IV, some circuit examples are depicted and the prescaler design is analyzed. Results of the prescaler and comparisons are reported in Section V, and the main conclusions are drawn in Section VI. II. THE E-TSPC CIRCUIT TECHNIQUE The allowed blocks in E-TSPC circuits have already been listed above and most of them are well-known blocks. Owing to the nonstandard nomenclature used and the importance of the block, the latch blocks and their N-MOS like versions are shown in Fig. 1. Although these blocks do not execute a true latch function, their presence is indispensable in any data chain for the holding operation. In the latch of Fig. 1, the clocked transistors of the n- and p-latches are placed close to the power rail, as suggested by [8]. Blocks with this configuration can attain a higher speed but suffer from charge-sharing problems. Latch configurations with clocked transistors close to the block output are also admissible. Note, that a new terminology associated with data precharged blocks [1], [2], with terms like pc or nonpc inputs, PH and PL blocks, and n-Dp and p-Dp blocks, is used in both definition 1 and Table I. Data precharged blocks are blocks where the output precharges are controlled by some of the data signal inputs, the so called pc-inputs, and not by the clock signal. In a PH data precharged block, the precharge is done when all pc-inputs are high; similarly, in a PL block, the precharge is done when all pc-inputs are low. If a PH (PL) block has all of its pc-inputs high (low) whenever the clock is low, thus performing the output precharge, the block is also called a n-Dp block; likewise, if a PH (PL) block has all of its pc-inputs high (low) whenever the clock is high, the block is called a p-Dp block. In E-TSPC circuits, the block connections should be done according to composition rules. Since the concept of data-chain is fundamental for understanding the rule, the definition of datachain is presented first.

1063-8210/02$17.00 © 2002 IEEE

302

Fig. 1.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002

The latch blocks of the E-TSPC circuit technique: (a) n-latch. (b) N-MOS like n-latch.(c) p-latch. (d) N-MOS like p-latch.

TABLE I CONSTRAINTS CONCERNING THE NUMBER OF INVERSIONS BETWEEN ADJACENT BLOCKS [1], [2]. n.r.: NO RESTRICTIONS EXIST; n.a.: the CONNECTION IS NOT ALLOWED; even: AN EVEN NUMBER OF BLOCKS IS REQUIRED; odd: AN ODD NUMBER OF BLOCKS IS REQUIRED

Definition 1: An n-data chain is any noncyclic signal propagation path: 1) containing at least one n-latch, or one n-dynamic, or one n-Dp block; 2) starting at a circuit external input, or at the output of a p-latch, or p-dynamic, or p-Dp block; when this output is followed by static blocks in the normal data flow, the data chain starts at the output of the last static block; 3) going through static, n-dynamic, n-Dp, or n-latch blocks; 4) regardless of the number and order of the blocks defined above; 5) finishing in a circuit external output or in the input of the first p-latch, or p-dynamic, or p-Dp block. For the p-data chains, an equivalent definition applies, replacing n with p and vice-versa. When the clock is high, n-data chains are in the evaluation phase; otherwise they are in the holding phase. P-data chains evaluate when the clock is low. In Fig. 2, part of a circuit schematic is depicted with seven complete n-data chains. Some examples are the n-data chain , , , and starting at input and going through blocks ; the n-data chain starting at and going through , , , , and ; and the n-data chain starting at and going , , , and . through Five of the six E-TSPC composition rules [1], [2] can be fused in one general rule that is presented as follows: General Rule for Data Chains: An n (p) data chain must present one of the two following configurations:

) to hold at least two blocks, one dynamic block and one latch block, and an even number of inversions between these blocks; ) to hold at least two latches and an even number of inversions between these blocks. Additionally, adjacent blocks in the same data-chain must keep between them an even or odd number of blocks (inversions) according to Table I constraints (two blocks are called adjacent if only static blocks are placed between them). Note that the three n-data chains listed in Fig. 2 conform with the general rule and also that this rule allows configurations that would be considered at fault if other composition rules of the literature were applied. For example, according to the composition rules presented in [7], the most comprehensive composition and should not be interconnected, rules to TSCP, blocks and should not be interposed between blocks and blocks and . Although the above-described rule is sufficient to ensure that data-precharged gates are precharged, that dynamic gates are not affected by incorrect discharges, and that the output of the data-chain last latch is steady at the end of holding phases, the rule conformation is not necessary to the correct operation of the circuit. In fact, typical TSPC circuits employ the D-flip-flop (D-FF) of Fig. 3 that does not conform to the general rule but operates correctly if proper delays exist. For this reason, an exception rule, comprehending configurations similar to the TSPC D-FF, is added as the sixth rule [1], [2].

NAVARRO AND VAN NOIJE: EXTENDED TSPC STRUCTURES WITH DOUBLE INPUT–OUTPUT DATA

303

Fig. 2. Example of n-data chains. The blocks mentioned in the text are named and hatched in the figure.

Fig. 3. Two TSPC D-flip-flops connected in series. A circuit example that does not conform to the general rule but usually operates correctly.

III. E-TSPC NEW STRUCTURES To understand the new structures that will be discussed later, two characteristics of the data-chain operations should be discerned. The first characteristic is found in data chains, n or p, where dynamic and data-precharged blocks are not present. For these data chains, here called fi-data chains (data chains with fusible input), during evaluation phases, input alterations do not cause undesirable discharges, so the data chain output will yield the correct value.1 The second characteristic is found on data-chains, n or p, where there is a single latch that is also the last block of the data-chain. In consequence, the data chain must comply with the rule . For these data chains, here called fo-data chains (data chains with fusible output), during the holding phases, the output keeps the result calculated along the previous evaluation phase but is in a high impedance state. Input and output structures handling input data and providing output data with rates twice higher than the clock rate are feasible due to the described characteristics. The input structures are obtained through the connection of the inputs of fi-n and fi-p data chains; as a result, while the clock signal is high, the input data go to the n-data chains, and, while the clock signal is 1The

input of the data chain is handled like a block output.

low, the data go to the p-data chains. The output structures are obtained through the connection of the outputs of fo-n and fo-p data chains (in case of more than one n (p) data chain, a unique latch must be the last latch of all n (p) data chains); similarly to the input structures, while the clock is high, the output data come from the n-data chains, otherwise, from the p-data chains. The combination of those structures allows new complex designs working with two data evaluations per clock cycle. Some simple examples are presented in Fig. 4. The input data rate in Fig. 4(a) is twice the clock rate and the rate of the two outputs is equal to the clock rate. In contrast, the rate of the two inputs in Fig. 4(b) is equal to the clock rate and the output rate is doubled. Finally, in Fig. 4(c), both input and output rates are doubled. Also, different state machine configurations can be adopted to fulfill the input and output throughput necessities. In Fig. 5, two examples are shown. The input data rate, the output data rate, and the present state data rate are twice the clock rate in the configuration of Fig. 5(a). In case of input rates equal to the clock rate and doubled output rate, a configuration like the one in Fig. 5(b) can be used. IV. CIRCUIT EXAMPLES The input and output structures explained above can be employed with advantage in designs where high speed is pursued; additionally, since it is possible to tradeoff speed against power consumption, reducing transistor dimensions or power supply values, lower-power consumption can alternatively be reached [9]. We will depict some design examples to illustrate the advantages of the new structures. Several circuits have already been implemented using the combinations presented in Fig. 4. In [2], the proposed 1:8 demultiplexer with byte aligner and the 8:1 multiplexer, both implemented in a 0.8 m CMOS technology, are examples of these designs. In the demultiplexer design, the input data is pushed on two parallel shift paths, one dedicated to the even bits and the

304

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002

Fig. 4. Structures to double the data rate; the cross-hatched blocks have to be used if the output data (a) or the input data (b) are synchronized.

Fig. 5.

State machine configurations.

Fig. 6.

(a) Transistor schematic of the the 1:8 demultiplexer. (b) Basic 2:1 multiplexer block.

Fig. 7. Schematic of the dual-modulus prescaler (divide-by-128/129).

other to the odd bits, in order to detect the A1 framing bytes (11 110 110) [10]. A circuit based on the natural 1:2 demultiplexer of the structure in Fig. 4(a) is used to distribute the input data to the two shift paths, as detailed in Fig. 6(a); the full 1:8 demultiplexer reached a measured maximum 1.38 GB/s operation rate at 4.7 V with 349 mW power consumption. In the multiplexer design, the input data is joined by several simple 2:1 multiplexers, the central block of the circuit. A circuit based on the 2:1 multiplexer of the structure in Fig. 4(b) is used in this task, as

detailed in Fig. 6(b), and the full 8:1 multiplexer reached a measured maximum 1.7 GB/s operation rate at 5 V with 87.7 mW [11]. Both designs present a very favorable performance when compared with other implementations. In addition, the use of D-FFs triggered by both clock edges, with structure similar to the one in Fig. 4(c), has already been suggested in the literature [12]. To illustrate the application of state machine configurations, we describe the design of a high speed dual-modulus prescaler (divide by 128/129), using a standard 0.8 m CMOS bulk process (ES2/ATMEL CMOS). Prescalers are employed in frequency synthesis systems and have been frequently used to compare different high speed circuit techniques [13]–[15]. In Fig. 7, the schematic diagram of a prescaler is depicted. Two parts can be identified in the diagram: the first part, inside the cross-hatched box, is composed of three D-FFs and two logic gates, and forms a synchronous divide-by-4/5 counter [see the timing diagram in Fig. 8(a)]; the other part, at the bottom of the figure, is composed of five D-FFs and forms an asynchronous divide-by-32 counter. The div32 signal, generated by the asynchronous counter, selects if the divide-by-4/5 counter counts up high) or up to 5 ( low). The fractional to 4 ( division ratio of the prescaler, 128 or 129, is selected according signal value. to the

NAVARRO AND VAN NOIJE: EXTENDED TSPC STRUCTURES WITH DOUBLE INPUT–OUTPUT DATA

305

Fig. 8. The timing diagram (a) and the transition diagram (b) of the synchronous divide-by-4/5 counter.

Fig. 9.

The transition diagram (a) and the timing diagram (b) of the new state machine which executes the synchronous division.

In this prescaler, the synchronous counter is the critical part in terms of speed. It may be treated as a state machine with one input, the div32 signal, and one output, the A signal; the transition diagram of this state machine is shown in Fig. 8(b). The states of the machine are codified by signals A, B, and C. Using the configuration in Fig. 5(b), we can build a state machine with the same input–output pair; in Fig. 9(a), the transition diagram of such machine is depicted. In this case, the state machine clock rate is half the original clock rate. To generate the counter output, the signal which will feed the asynchronous counter, the output of a fo-n data chain and the output of a fo-p data chain, respectively conveying signal A and signal B, are fused. As a result, the counter output carries the A value when the state machine clock is high and the B value when the state machine clock is low. In Fig. 9(b), the timing diagram of the new divide-by-4/5 counter is shown. Note that when the machine is executing the divide-by-4 operation, two cases are expected: the machine moving back and forth between “000” and “110” states or between “100” and “010” states. In Fig. 10, the transistor schematic of the new approach of the divide-by-4/5 counter is depicted with the transistor dimensions in m. The three cross-hatched boxes mark the positive edgetriggered D-FFs; the fusion of signals A and B is done through

the data chains sketched in the upper portion of the figure. Note that small dimension transistors are applied in the design. V. RESULTS A full prescaler circuit layout was formed with the divide-by-4/5 counter considered above. Conventional positive edge-triggered TSPC D-FFs (Fig. 3) are used for all the flip-flops of the asynchronous counter except one: the flip-flop clocked directly by the synchronous counter. For this, the conventional positive edge-triggered D-FF was slightly modified to reach higher speed (an N-MOS like p-latch block is used as the first block of the flip-flop). The division of the clock signal to create the clk/2 signal (Fig. 10) is performed by a negative edge-triggered D-FF with a modified configuration [13] for speed optimization. The new prescaler performance was evaluated through SPICE simulations (level two typical parameters, at room temperature) of the netlist extracted from the layout. The simulation results are compared with results of the prescaler described in [16] which has the following characteristics: it was designed with the E-TSPC technique; the process used is the same ES2/ATMEL CMOS process of this work; small transistor sizes were also adopted.

306

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002

Fig. 10. Transistor schematic of the new synchronous divide-by-4/5 counter. The transistor widths, in m, are indicated in the figure; the transistor lengths are 0.8 m. The signal clk/2 is the input clock divided by 2.

In Fig. 11, the simulated maximum input frequency results of the new prescaler and both the simulated and the measured maximum input frequency results of the prescaler in [16] are shown. For both the measurements and the simulations, the maximum input clock excursion is 3 V, since the pulse generator employed in the measurements had a 3 V maximum excursion. The graphic presents the significant gain in speed, over 50%, provided by the new implementation. The power performances of the two circuits are presented in Fig. 12 (only simulation results). The total power consumption is calculated through the addition of three terms: the power consumption of the clock buffer (for the new prescaler, in this term the power consumption of the D-FF that divides by 2 the clock signal is also included); the power consumption of the synchronous divide-by-4/5 counter; and the power of the asynchronous counter. In Fig. 12(a), the total power consumption for each prescaler at maximum speed and at different values of power supply is depicted. The graphic shows that the new prescaler cannot only reach higher speed but also consumes considerably less power if the two prescalers operate with the same input signal frequency and with the minimum needed power-supply voltage; for instance, the prescaler in [16] can reach 1.4 GHz with power supply of 4.8 V and consumes 34 mW; the new prescaler, however, may reach the same frequency with power supply of 3.2 V and consumes less than 12 mW. In Fig. 12(b), the contribution of the power terms and the total power are drawn; in this case, the input frequency for both circuits are equal to the maximum speed reached by the prescaler described in [16]. The graphic shows that, despite the higher complexity of the new prescaler, the two circuits consume nearly the same power when working with the same power supply and the same input frequency. The performance of different dual-modulus prescalers presented in the literature and our test outcomes are summarized in Table II. Although the comparison among the implementations is feasible, some caution should be taken during the analyses, mainly with the power-consumption data analyses. We notice that for some papers used in this work, [13], [15], [16], [17],

Fig. 11. Results for the prescalers maximum input frequency versus the power supply. Both the simulation and the measurement results were obtained using an input pulse with maximum excursion of 3 V.

the authors do not elucidate which power consumption terms were considered in the power results (we found, through private communication with the authors, that in [13], [15], [16] the presented power results do not comprise the clock buffer consumption). Table II shows that the new implementation has the best power consumption characteristics and it is one of the fastest prescalers. VI. CONCLUSION The enlargement of the E-TSPC technique with new structures that may double the input and output data rates was reported. Examples of circuits, an 8:1 demultiplexer, a 1:8 multiplexer, and a prescaler, were given to illustrate the applications of these structures. In particular, the detailed design of a dual-modulus prescaler (divide by 128/129), developed in a 0.8 m CMOS process, was studied. The complete layout was drawn and its netlist for SPICE simulations, extracted. The simulated circuit attained 2.19 GHz and 20.9 W MHz power consumption with 5 V (the power consumption of the clock buffer

NAVARRO AND VAN NOIJE: EXTENDED TSPC STRUCTURES WITH DOUBLE INPUT–OUTPUT DATA

307

Fig. 12. The power performance of the two prescalers (simulation results). (a) The total power consumption at maximum speed versus the prescaler input frequency for different values of power supply. (b) The power consumption terms versus the power supply.

TABLE II SOME PRESCALER RESULTS ARE SUMMARIZED. NOTE THAT, FOR DIFFERENT WORKS, THE PARTIAL, WITHOUT CLOCK BUFFER CONSUMPTION, THE TOTAL, OR, IN A FEW CASES, BOTH POWER CONSUMPTION RESULTS ARE SUPPLIED. THE CROSS-HATCHED VALUES OF THE TABLE FOUND BY SIMULATIONS*

*It is not clear whether the power consumption value comprises the clock buffer consumption.

is included). The results, compared with other implementations, reassure the advantages of the proposed structures. ACKNOWLEDGMENT The authors would like to thank J. Park and H. Yan for the valuable information concerning the prescaler measurement results of [13] and [15]. REFERENCES [1] J. Navarro and W. Van Noije, “E-TSPC: Extended True Single Phase Clock CMOS circuit technique,” in VLSI: Integrated Syst. Silicon, IFIP Int, Conf. VLSI, R. Reis and L. Claesen, Eds., London, U.K., 1997, pp. 165–176. [2] J. Navarro, “Design techniques for high speed CMOS ASIC’s,” Ph.D. dissertation, Univ. São Paulo, Dept. Elect. Eng., São Paulo, Brazil, 1998. [3] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI design, 2 ed. Reading, Ma: Addison-Wesley, 1993. [4] R. H. Krambeck, C. M. Lee, and H.-F. S. Law, “High-speed compact circuits with CMOS,” IEEE J. Solid-State Circuits, vol. 17, pp. 614–619, June 1982. [5] N. F. Gonçalves, “NORA: a racefree CMOS technique for register transfer systems,” Ph.D. dissertation, Katholieke Universiteit Leuven, Leuven, Belgium, 1984.

[6] Y. Ji-ren, I. Karlsson, and C. Svensson, “A true single-phase-clock dynamic CMOS circuit technique,” IEEE J. Solid-State Circuits, vol. 22, pp. 899–901, Oct. 1987. [7] P. Larsson, “Skew safety and logic flexibility in a true single phase clocked system,” in Proc. IEEE ISCAS, Seattle, USA, WA, May 1995, pp. 941–944. [8] Q. Huang, “Speed optimization of edge-triggered nine-transistor D-flip-flop for gigahertz single-phase clocks,” in Proc. IEEE ISCAS, Chicago, IL, May 1993, pp. 2118–2121. [9] A. P. Chandrakasan and R. W. Brodersen, Low power digital CMOS design, 2 ed. Norwell, MA: Kluwer , 1996. [10] F. L. Romão, J. Navarro, R. Silveira, and W. Van Noije, “1.2 GB/S SONET/SDH demux in CMOS technology,” in Proc. SBMO/IEEE MTT-S Int. Microwave and Optoelectronics Conf., vol. 1, Rio de Janeiro, BR, July 1995, pp. 52–57. [11] J. Navarro and W. Van Noije, “Design of an 8:1 MUX at 1.7 Gbit/s in 0.8 m CMOS technology,” in Proc. IEEE Great Lakes Symp. VSLSI, Lafayette, IL, Feb. 1998, pp. 103–107. [12] M. Afghahi and J. Yuan, “Doubled edge-triggered D-flip-flops for high-speed CMOS circuits,” IEEE J. Solid-State Circuits, vol. 26, pp. 1168–1170, Aug. 1998. [13] B. Chang, J. Park, and W. Kin, “A 1.2 GHz CMOS dual-modulus prescaler using new dynamic D-type flip-flops,” IEEE J. Solid-State Circuits, vol. 31, pp. 749–752, May 1996. [14] C.-Y. Yang, G.-K. Dehng, J.-M. Hsu, and S.-I. Liu, “New dynamic flip-flop for high-speed dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. 33, pp. 1568–1571, Oct. 1998.

308

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002

[15] H. Yan, M. Biyani, and K. K. O, “A high-speed CMOS dual-phase dynamic-pseudo NMOS ((DP ) ) latch and its application in a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. 34, pp. 1400–1404, Oct. 1999. [16] J. Navarro and W. Van Noije, “A 1.6-GHz dual modulus prescaler using the Extended True-Single-Phase-Clock CMOS circuit technique (E-TSPC),” IEEE J. Solid-State Circuits, vol. 34, pp. 97–102, Jan. 1999. [17] J. Craninckx and M. S. J. Steyaert, “A 1.75-GHz/3-V dual-modulus divide-by-128/129 prescaler in 0.7 m CMOS,” IEEE J. Solid-State Circuits, vol. 31, pp. 890–897, July 1996.

João Navarro S., Jr., received the B.S., M.S., and Ph.D. degrees in electrical engineering from the Polytechnic School, University of São Paulo (EPUSP), Brazil, in 1986, 1990, and 1998, respectively. Since 1990, he has been a Research Staff Member at the EPUSP and since 2001, he also been a Professor of Computer Science at SENAC, Brazil. His current research interests include high-speed digital circuits, RF designs, and clock distribution.

Wilhelmus A. M. Van Noije was born in the Netherlands. He received the B.S.E.E. and the M.S.E.E. degrees from the University of São Paulo, Brazil, and the Ph.D. degree in applied science from the Katheoleke Universiteit Leuven, Belgium, in 1975, 1978, and 1985, respectively. Since, 1987, he has been with the Department of Electrical Systems Engineering, University of São Paulo (PSI/EPUSP) where in 1998, he became a Full Professor and since 1999, he has been the Department Head. Also, since 1988, he has been the Coordinator of the VLSI Systems Design Division of the Integrated Systems Laboratory (LSI/PSI/EPUSP), and is involved in IC layout synthesis on sea-of-gates (SOG) structures, analog circuits on SOG, high-speed CMOS integrated circuit techniques, and recently in RF circuits design.