Non-intrusive data-driven model order reduction for circuits based on Hammerstein architectures

Joshua Hanson¹, Biliana Paskaleva¹, and Pavel Bochev¹

¹Affiliation not available

June 07, 2024
Non-intrusive data-driven model order reduction for circuits based on Hammerstein architectures

Joshua Hanson, Biliana Paskaleva Member, IEEE, Pavel Bochev

Abstract—We demonstrate that data-driven system identification techniques can provide a basis for effective, non-intrusive model order reduction (MOR) for common circuits that are key building blocks in microelectronics. Our approach is motivated by the practical operation of these circuits and utilizes a canonical Hammerstein architecture. To demonstrate the approach we develop a parsimonious Hammerstein model for a non-linear CMOS differential amplifier. We train this model on a combination of direct current (DC) and transient Spice (Xyce) circuit simulation data using a novel sequential strategy to identify the static nonlinear and linear dynamical parts of the model. Simulation results show that the Hammerstein model is an effective surrogate for the differential amplifier circuit that accurately and efficiently reproduces its behavior over a wide range of operating points and input frequencies.

Index Terms—Hammerstein, System Identification; MOR, model order reduction; ROM, reduced-order model, ODE, Ordinary Differential Equation; DAE, Differential Algebraic Equation

I. INTRODUCTION

Numerical circuit simulations, often referred to as Spice simulations [23], are foundational to the design, assessment, and qualification of modern circuits. Spice simulations use transistor-level circuit descriptions built from compact device models by application of modified nodal analysis (MNA) [16]. A generic compact model for a device $D$ is a system of differential algebraic equations (DAEs) defined by four nonlinear functions $f_{D,1}, \phi_{D,1}, f_{D,2}, \phi_{D,2}$:

$$
D \rightarrow \begin{cases}
    i = \frac{d}{dt}\phi_{D,1}(t, x, v) + f_{D,1}(t, x, v) \\
    0 = \frac{d}{dt}\phi_{D,2}(t, x, v) + f_{D,2}(t, x, v)
\end{cases}.
$$

In (1), $v$ and $i$ are the voltages at and the currents into the device terminals, respectively, and $x$ is a vector containing internal states (e.g., charges stored in capacitors or fluxes stored in inductors). MNA applies Kirchoff’s current law at each circuit node to combine compact device models into circuit descriptions given by DAEs with the same structure as (1), and size proportional to the number of devices in the circuit. As a result, Spice simulations based on transistor-level descriptions can become intractable for large circuits. For example, modern integrated circuits (IC) can have up to hundreds of millions of circuit elements resulting in large-scale DAEs with tremendous computational costs. Another complication is that the compact device models themselves, i.e., the functions $f_{D,1}, \phi_{D,1}, f_{D,2}, \phi_{D,2}$ need to be calibrated or modified to fit new technology and/or physics.

Computational burdens of transistor-level simulations for large circuits have spurred interest in model order reduction (MOR) for microelectronics. Broadly speaking, the aim of MOR is to replace a highly accurate but computationally expensive full-order model (FOM) of a system by a smaller reduced-order model (ROM) that has acceptable accuracy and a much lower computational cost. Besides the significant reduction in simulation time, utilization of circuit ROMs in place of the detailed transistor-level descriptions can also provide IP protection.

One of the first and still widely used MOR approaches for circuits is macro-modeling (MM) [31]. A macro-model is a simplified circuit model that typically includes a combination of ideal circuit elements (e.g., resistors, capacitors) and dependent and independent voltage and current sources. MM approaches have been applied to model the behavior of a range of circuits; see, e.g., [6], [11], [20] for MMIs of operational amplifiers. The two principal MM techniques are simplification and build-up [11], [32]. The former successively replaces parts of the original circuit FOM by equivalent descriptions in terms of ideal circuit elements until a sufficiently small model is obtained. As a result, simplification yields circuit ROMs that closely resemble the original FOM. In contrast, build-up attempts to construct ad hoc blocks from ideal circuit elements that approximate the behavior of the circuit FOM, but not its structure. Although MM is applicable to general circuits, it is a manual, heuristic effort that relies on expert intuition and understanding of the circuit operation.

More rigorous and automated MOR can be developed by restricting the class of circuit FOMs to linear time-invariant (LTI) systems. These efforts have been driven primarily by the desire to improve the efficiency of post-layout simulations by reducing the size of the circuit DAE due to parasitic capacitances and resistances in interconnect structures and transmission lines. Virtually all MOR techniques for LTI systems exploit the fact that such systems are fully characterized by their transfer functions (TF) and so, order reduction can be achieved by replacing the “full size” TF by an approximate TF. The latter may be defined by, e.g., matching the leading terms (“moments”) of the TF’s Taylor expansion at the direct current (DC) operating point, or by using the invariance of
the TF with respect to equivalence transformations. The latter is the basis of balanced truncation MOR [2, Chapter 6], whereas the former has spawned a number of explicit and implicit moment matching approaches. Explicit techniques such as the Asymptotic Waveform Evaluation (AWE) [28] match the moments of the TF for the FOM, which can lead to numerical issues in some cases. These can be avoided by using implicit techniques, which first project the FOM dynamics onto a lower-dimensional subspace to reduce its size and then match the moments of the smaller system. This step is usually performed by using the block Krylov basis matrix as the projection matrix. Two common ways to generate this matrix are the block Arnoldi and Lanczos algorithms. The first method is used in, e.g., the PRIMA (passive reduced-order macromodels for linear RLC systems) algorithm [26], while the second is the basis of the ROMs for linear RLC sub-circuits in [9]. We refer to [24] for more information about moment-matching MOR and applications to high-speed interconnects.

MOR techniques have also been developed for other specific classes of FOMs such as linear time-varying (LTV) systems and polynomial systems. The latter can be characterized by a kind of nonlinear “transfer function” defined by Volterra kernels, which is the basis for the nonlinear model order reduction method (NORM) [21], [27]. NORM works by first deriving a set of minimal Krylov subspaces followed by projection of the original system onto that set and then performing direct moment matching for the nonlinear Volterra kernels of the projected system. An example of MOR for LTV systems is the time-varying Padé (TVP) method [30], which results in ROMs comprising an LTI system followed by a memoryless mixing operation, and the related approach in [22] for linear periodic time-varying systems.

The QLMOR (model order reduction via quadratic-linear systems) approach [12] extends moment matching to the class of nonlinear systems with polynomializable dynamics. QLMOR starts by transforming a system into an equivalent quadratic-linear DAE (QLDAE) (i.e., the dynamics are quadratic in the state variables and linear in the input variables). The polynomialized dynamics are then projected onto a lower-dimensional subspace, followed by moment-matching of their nonlinear Volterra kernels. We refer to [35] for further information about nonlinear MOR for circuits.

Utilization of Artificial Neural Networks (ANN) as a data-driven MOR approach for circuits is relatively new. ANNs can represent complex nonlinear behaviors by a comparatively small number of parameters, which makes them attractive for learning ROMs from data. An example of this technique is the enhanced ANN operational amplifier (op amp) model in [36] that captures both static and dynamic behaviors of the circuit and is close to eight times faster in comparison with a transistor-level model. Another example is [38], which provides theoretical analysis and case studies demonstrating the use of recurrent neural networks (RNN) for transient modeling of nonlinear circuits, as well as capturing aging effects and process variations.

Most of the above MOR techniques are designed to exploit the mathematical structure of the underlying circuit FOM. Such an approach offers specific advantages but it also has limitations. For example, the scope of moment matching is restricted to circuits that have at most polynomial nonlinearities. QLMOR admits more general systems but at the cost of embedding them into larger, possibly higher index DAEs, which are more difficult to solve numerically. Reliance on FOM structure also makes these approaches intrusive as they require access to a transistor-level description of the circuit. The latter may be unavailable when the circuit is proprietary or when it involves new technologies not yet supported by compact models. ANNs can in principle enable non-intrusive MOR for general systems but further theoretical understanding and explainability is required for them to become a more trusted design tool.

In this paper we formulate and demonstrate an alternative MOR approach that uses the characteristic behavior of the circuit rather than its mathematical structure to guide the selection of an appropriate order reduction mechanism. Our approach is motivated by the fact that many of the key functional building blocks in microelectronics systems such as operational amplifiers, comparators, and voltage regulators are characterized by simple “scripted” behaviors even though their transistor-level descriptions may have rather complex nonlinear mathematical structures. We posit that system identification (SysID) techniques can provide a basis for effective and non-intrusive MOR for such circuits that is not limited by the complexity of their device-level descriptions. Such an approach makes the selection of a system architecture that is sufficiently expressive to represent the circuit’s behavior the principal task in the MOR process. Once this architecture is selected, a circuit-level ROM can be inferred directly from laboratory or synthetic data without requiring device-level modeling. Viability of this approach is underpinned by the fact that SysID is a powerful technique that can discover a low-complexity model of dynamical system that delivers the best match for a collection of dynamical input–output (or input-state-output) data. SysID can also provide adequate models for systems for which reliable first-principles descriptions are impractical or unavailable.

This paper is organized as follows: Section II outlines the SysID-based MOR approach and motivates the selection of the underlying model architecture. Section III describes the CMOS differential amplifier circuit used to demonstrate the approach. Section IV describes a generic Hammerstein model architecture, its specialization to the model circuit, and the training process. Section V presents simulation results and Section VI summarizes our conclusions and future work.

II. OUTLINE OF THE APPROACH

Our approach falls into the category of SysID methods that assume an internal structure such as Wiener [4], Hammerstein, [19], [10], Hammerstein-Wiener [37], and Wiener-Hammerstein [34] model forms. Specifically, we shall develop

1 In this sense macro-modeling is also an intrusive MOR approach.
2 Notably, some approaches developed by the Electronic Design Automation community for LTI model order reduction could be interpreted as SysID approaches, such as those based on transfer function fitting [3], [33] or stability-constrained time-series regression [5].
3 We note that the TVP model [30] can be interpreted as a Wiener model.
a ROM for a nonlinear CMOS differential amplifier circuit, described in Section III, by adopting the Hammerstein architecture, which consists of a static nonlinear map (whose input is the port voltages) in cascade with a linear time-invariant dynamical system (whose output is the port currents).

Our choice of a demonstration circuit is motivated by the fact that differential amplifiers play a fundamental role in modern electronics and are a key building block in numerous electronic systems and devices such as operational amplifiers, analog-to-digital converters, and audio systems. On the other hand, we select the Hammerstein architecture as a basis for our ROM for two key reasons. First, we will show that some already established compact circuit models can be expressed as Hammerstein models. Second, the type of circuit considered here operates in a predominately memoryless capacity at low to moderate frequencies, such that a static nonlinear block can capture the salient behavior. The dynamical behaviors originating from e.g., parasitic elements exhibit small amplitudes and fast timescales, and can be approximated well in a neighborhood of the operating point by a linear system, i.e., the operation of the circuit maps well onto the Hammerstein architecture. Furthermore, the “separable” nonlinear (DC) and linear (AC) components of the circuit behavior can be captured by data sets that represent the nonlinear and linear dynamics separately, and modeled highly accurately and efficiently in a sequential manner. Such sequential approach significantly simplifies training and reduces training data and training time.

In particular, here we use a combination of DC and transient circuit simulation input-output data to infer the static and dynamic blocks of the model, respectively.

The key contributions of this work are thus the demonstration of SysID as an effective MOR approach and the introduction of the sequential model inference for Hammerstein architectures as an effective training strategy that enables a simplified model development workflow.

III. CMOS DIFFERENTIAL AMPLIFIER CIRCUIT

To develop and test our model order reduction (MOR) approach we consider a small CMOS nonlinear differential amplifier circuit whose schematic is shown in Figure 1. The behavior of this circuit is simple enough to be a good candidate for our SysID MOR approach, yet complex enough that it cannot be captured completely by a set of linear differential equations, nor by completely memoryless nonlinear current-voltage characteristics. The main function of an analog differential amplifier is to produce an output voltage $V_3$ that is proportional to the difference between two input voltages $V_1$ and $V_2$. A basic memoryless linear differential amplifier can thus be modeled by the equation

$$V_3 = A_d(V_1 - V_2),$$

where $A_d$ is a large positive constant called the differential gain; see Fig. 2. An additional non-ideal term $A_{cm}(V_1 + V_2)$ could be included on the right-hand side to capture the unintended amplification of the average value of the input voltages, where the constant $A_{cm}$ is called the common-mode gain. In general we can express the output voltage as a global nonlinear function of both the input voltages.

![Fig. 1: Schematic of the CMOS nonlinear differential amplifier circuit considered in this work][1]

![Fig. 2: A basic linear differential amplifier model. The impedances $Z_{in}$ and $Z_{out}$ may in general comprise both resistive and reactive parts. The supply voltages $+V$ and $-V$ establish upper and lower bounds on the voltage produced by the dependent source.][2]

When the ports of this circuit are connected to non-ideal sources or load, current flowing through the input and output impedances will create a discrepancy between the ideal input and output voltages and those actually produced at the terminals. This discrepancy is determined by the relationship between the input and output impedances of the external elements and the internal input and output impedances of the circuit. We typically represent these non-ideal input and output impedances using some combination of resistors and capacitors, yielding a linear time-invariant sub-system. Replacing the expression controlling the dependent source with a nonlinear function of the input voltages, the basic differential amplifier model in Fig. 2 can be represented exactly in the form of a
Table I. Several of the circuit’s dynamical characteristics such as the cutoff frequency \( f_{\text{3dB}} \) and slew rate are derived for a specific load connected to the output terminal. For this specific design analysis the load capacitor \( C_L \) is set to 5 pF.

### Table 1: Design specifications (left); 800 nm CMOS technology parameters (right).

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>( A_d )</td>
<td>40 dB</td>
<td>( V_{DD} )</td>
<td>5 V</td>
</tr>
<tr>
<td>SR</td>
<td>10 V/( \mu )</td>
<td>( \lambda_N )</td>
<td>0.04 V/( \mu )</td>
</tr>
<tr>
<td>( f_{\text{3dB}} )</td>
<td>200 kHz</td>
<td>( \lambda_P )</td>
<td>0.05 V/( \mu )</td>
</tr>
<tr>
<td>IC(<em>{\text{CMR}})</em>{\text{min}} )</td>
<td>1.5 V</td>
<td>( V_{in} )</td>
<td>0.7 V</td>
</tr>
<tr>
<td>IC(<em>{\text{CMR}})</em>{\text{max}}</td>
<td>4 V</td>
<td>( V_{DP} )</td>
<td>-0.7 V</td>
</tr>
<tr>
<td>( P_{\text{loss}} )_{\text{max}}</td>
<td>2 mW</td>
<td>( K_N )</td>
<td>110 ( \mu )A/( V^2 )</td>
</tr>
<tr>
<td>( C_L )</td>
<td>5 pF</td>
<td>( K_P )</td>
<td>50 ( \mu )A/( V^2 )</td>
</tr>
</tbody>
</table>

Fig. 3: Shichman-Hodges (a.k.a. “Level 1") model of an NMOS transistor. The drain current \( I_D \) provided by the dependent source is a continuous piecewise-quadratic function of the gate, drain, and source voltages \( V_G \), \( V_D \), and \( V_S \), respectively. The body terminal is often connected to ground or the source terminal for NMOS devices.

Hammerstein model, with the port voltages assigned to the input variables of the model and the port currents assigned to the output variables.

The nonlinear relationship between the input and output voltages originates from the voltage-dependent drain (resp. source) currents in the NMOS (resp. PMOS) transistors. In the Shichman-Hodges MOSFET model (depicted in Fig. 3), this is represented by a piecewise-quadratic nonlinear function. The parasitic capacitances between transistor terminals are responsible for current flow during transient operation. The interplay between these constituent elements yields the emergent behavior of the overall differential amplifier.

When the circuit in Fig. 1 is included as a sub-circuit in the context of a larger system, a global Spice simulation only requires information about the sub-circuit behavior at the ports. Thus the aim of a reduced-order model is to subsume all of the individual transistor current-voltage characteristics into a unified expression relating only the voltages and currents at the external ports, which allows us to abstract away the detailed internal behaviors. In this way, intermediate computations for internal node voltages and branch currents are eliminated, yielding a more computationally efficient model.

The differential amplifier considered in this paper follows the analysis presented [29], and was designed to meet a specific set of operational characteristics. In particular, the widths and the lengths of the individual transistors were selected in order for the circuit to meet the specifications in Table I. Several of the circuit’s dynamical characteristics such as the cutoff frequency \( f_{\text{3dB}} \) and slew rate are derived for a specific load connected to the output terminal. For this specific design analysis the load capacitor \( C_L \) is set to 5 pF.

IV. HAMMERSTEIN MODEL

Model structure selection is one of the cornerstones of system identification. If the architecture chosen is incapable of exhibiting the behaviors of interest, then the model accuracy will suffer regardless of the training stimulus and optimization algorithms used. On the other hand if the architecture is extremely expressive, the model will likely be capable of accurately reproducing the training data, but it may incur excessive computational burden to evaluate and possibly demonstrate poor generalization to new data. In the context of non-intrusive reduced-order modeling, we seek the simplest possible model structure that remains expressive enough to accurately manifest the desired behaviors.

Many circuits and devices can be effectively modeled by block-oriented nonlinear architectures, which are constructed from series, parallel, and feedback interconnections of memoryless nonlinear functions and linear time-invariant dynamical systems. Many practical circuits and devices operate in a predominately memoryless capacity, such that a static nonlinear block can capture the salient behavior. The dynamical behaviors originating from e.g., parasitic elements exhibit small amplitudes and fast timescales, which can often be well-approximated in a neighborhood of the operating point by a linear system.

Hammerstein models — represented by a static nonlinearity in cascade with a linear time-invariant system; see Figure 4 — are good candidates for emulating these common circuits. In a certain sense, this is the simplest possible nonlinear model architecture suited for capturing mainly “DC” behavior and correcting for parasitic and loading effects with an “AC” augmentation. Thévenin and Norton equivalent circuits for linear electrical networks share a similar structure. To form a Hammerstein model from a Thévenin or Norton equivalent circuit, the lumped voltage or current source is replaced by a dependent source with a nonlinear relationship to some independent input variables (usually port voltages), and the lumped impedance serves the role of the dynamic linear block.

Fig. 4: Block diagram of the generic Hammerstein model structure, where \( \varphi \) is a memoryless nonlinear function, \( H \) is a linear time-invariant system, \( u \) is the input, and \( y \) is the output.

\[ H(y) = \varphi(u) \]
A. Multi-port circuit models

For input $u$ and output $y$ (which may in general be vector-valued), the canonical state-space representation of the generic Hammerstein model in Fig. 4 takes the form

$$
\dot{x} = Ax + B\varphi(u) \\
y = Cx + D\varphi(u),
$$

(2)

where $A, B, C, D$ are matrices and $\varphi$ is a nonlinear map. Although generic behavioral circuit models do not require explicitly distinguishing between input and output variables, for convenience of integration into standard circuit simulation software we typically identify port voltages as the input variables and port currents as the output variables. Such systems can be written as a system of differential-algebraic equations

$$
i = \frac{d}{dt}q_1(t, x, v) + f_1(t, x, v) \\
0 = \frac{d}{dt}q_2(t, x, v) + f_2(t, x, v),
$$

which has the same structure as the generic compact device model in (1). As in that model, $v$ denotes the port voltages, $i$ denotes the port currents, and internal states (e.g., charges stored in capacitors or fluxes stored in inductors) are subsumed in $x$. To implement the Hammerstein model in this standard form, set $u \leftarrow v$, $y \leftarrow i$ and define

$$
q_1(t, x, v) = 0, \quad f_1(t, x, v) = Cx + D\varphi(v) \\
q_2(t, x, v) = -x, \quad f_2(t, x, v) = Ax + B\varphi(v).
$$

For the purposes of training the model, we can work directly with (2) and convert the identified model into the proper form afterwards, avoiding the need to solve any implicit DAEs within the training loop.

We now proceed to specialize the generic Hammerstein model architecture in Fig. 4 to develop a ROM for the CMOS differential amplifier circuit. Since the linear block in this architecture is a standard LTI system, the key step in the specialization process is the design of the static nonlinear block $\varphi$. This task is discussed in detail in the next section.

B. Design of the static nonlinearity

To define the nonlinear map $\varphi$ we shall take advantage of the fact that the circuit in Figure 1 is approximately memoryless, which also holds for many other similar small amplifier topologies. This behavior suggests that the transient memoryless, which also holds for many other similar small amplifier topologies. This behavior suggests that the transient

$$
\varphi : \mathbb{R}^3 \rightarrow \mathbb{R}^3, \quad (V_1, V_2, V_3) \mapsto (I_{DC}, I_{DC}),
$$

(3)

as a composition of component-wise identity and squaring operations with the $I-V$ map $I_{DC}$ of the circuit. The squaring operation provides a prototypical nonlinearity that, when appropriately filtered by a linear system, enables the model to capture the subtle bias shift and harmonic distortion introduced at higher frequencies. Implementation of the nonlinear block (3) requires a suitable approximation of the nonlinear map $I_{DC}$.

There are several options that one can consider for this purpose such as parameterizing $I_{DC}$ as a member of some generic expressive function class (e.g., polynomials, rational functions, neural nets), or using non-parametric regression such as Moving Least Squares (MLS); see, e.g., [1] for applications of neural net and MLS regression of $I$-$V$ characteristics of the circuit, i.e., the map that computes the DC port currents resulting from a given set of DC port voltages. We then define the map

$$
\varphi : \mathbb{R}^3 \rightarrow \mathbb{R}^3, \quad (V_1, V_2, V_3) \mapsto (I_{DC}, I_{DC}),
$$

(3)

denote the $I-V$ characteristics of the circuit, i.e., the map that computes the DC port currents resulting from a given set of DC port voltages. We then define the map

$$
\varphi : \mathbb{R}^3 \rightarrow \mathbb{R}^3, \quad (V_1, V_2, V_3) \mapsto (I_{DC}, I_{DC}),
$$

(3)

as a composition of component-wise identity and squaring operations with the $I-V$ map $I_{DC}$ of the circuit. The squaring operation provides a prototypical nonlinearity that, when appropriately filtered by a linear system, enables the model to capture the subtle bias shift and harmonic distortion introduced at higher frequencies. Implementation of the nonlinear block (3) requires a suitable approximation of the nonlinear map $I_{DC}$.

There are several options that one can consider for this purpose such as parameterizing $I_{DC}$ as a member of some generic expressive function class (e.g., polynomials, rational functions, neural nets), or using non-parametric regression such as Moving Least Squares (MLS); see, e.g., [1] for applications of neural net and MLS regression of $I-V$ characteristics to develop compact device models. Here we choose to represent $I_{DC}$ by using a table-based approach [13], [14] implemented with piecewise trilinear\footnote{This class of functions is common in finite element methods for PDEs [7].} interpolants. To that end, assume that the full range of possible port voltages is given by the box

$$
D = \prod_{i=1}^{3} [V_{i,min}, V_{i,max}] \subset \mathbb{R}^3,
$$

(4)

where $V_{i,min}$ and $V_{i,max}$ denote lower, resp. upper bounds on the voltage $V_i$ at the $i$th port. Let $K_i > 1$ be an integer specifying the sampling density for the $i$th port. We then perform a DC sweep using $K_i$ voltage points $V_{i,min} \leq V_{i,k_i} \leq V_{i,max}$, $k_i = 1, \ldots, K_i$ for each of the ports, to obtain a total of $K = K_1 \times K_2 \times K_3$ voltage-current pairs \{(V_{1,k_1}, V_{2,k_2}, V_{3,k_3}), (I_{1,k_1}, I_{2,k_2}, I_{3,k_3})\}. The sample voltages $\bar{V}_{i,k_1}, V_{2,k_2}, V_{3,k_3}$ form a Cartesian grid on $D$, however, this grid is not required to be uniform.

We then approximate the map $I_{DC}$ by the piecewise trilinear interpolant $I_{DC}^P$ of the voltage-current pairs defined above. Computation of $I_{DC}$ for a given set of DC port voltages $V = (V_1, V_2, V_3)$ is a local operation that only requires data from the Cartesian grid cell $\bar{D}(V)$ containing $V$. To explain this step, denote for notational clarity the vertices of this cell as $\bar{V}_{ppr} = (\bar{V}_{1,p}, \bar{V}_{2,q}, \bar{V}_{3,r})$ with $p, q, r = 0, 1$ and let $\psi_{i,s}$, $i = 1, 2, 3$, $s = 0, 1$ denote the local linear basis functions such that $\psi_{i,s}(\bar{V}_{i,s}) = \delta_{\sigma,s}$ for $\sigma, s = 0, 1$. Then the port
current at the $j$th port corresponding to the input port voltage $V$ can be approximated by

$$I_j^p = \sum_{p,q,r=0}^{1} \tilde{I}_{j,pqr} \psi_{1,p}(V_1) \psi_{2,q}(V_2) \psi_{3,r}(V_3)$$  \hspace{1cm} (5)$$

where $\tilde{I}_{j,pqr}$ is the current at the $j$th port corresponding to vertex $V_{pqr}$. One can show [7] that the map $I_{DC}$ defined by (5) is second-order accurate in the sense that

$$|I_j^p - I_j(V)| \leq C \text{diam} (\tilde{D}(V))^2 \sup_{V \in \tilde{D}} |D^2I_j(\tilde{V})|,$$  \hspace{1cm} (6)$$

where $I_j(V)$ is the exact port current corresponding to $V$, $C$ is a positive constant, and $D^2$ is the total second derivative operator. This error bound implies that approximation error will be larger in regions where $I_j$ has steep gradients. We note that when $V$ coincides with a grid node $V_{pqr}$ formula (5) simply returns the corresponding port currents $\tilde{I}_{j,pqr}$. The memory requirements and the accuracy of this approach are directly proportional to the number of voltage-current pairs $K = K_1 \times K_2 \times K_3$. Increasing $K$ improves the error bound (6) but could lead to a sizeable memory footprint. Nonetheless, trilinear interpolation is simpler to set up, cheaper to evaluate, and requires less storage than a cubic or higher order spline-based interpolation scheme, which is the primary reason we use (5) in this paper.

An important consequence of a table-based regression of the $I-V$ characteristic is that it effectively leads to a map $\varphi$ that has no unidentified parameters — all model degrees-of-freedom are pushed entirely into the linear filter, which is responsible for shaping the raw output ($I_{DC}, I_{DC}^p$) into the transient currents ($i_1, i_2, i_3$). This fact forms the basis of our sequential parameter identification strategy, which we present in the next section.

C. Sequential parameter identification algorithm

Consider again the circuit in Figure 1. Suppose we connect terminals 1, 2, 3 to some external voltage sources and loads, which should be chosen to reproduce typical operating conditions for this circuit. The behaviors we want the model to discover should be induced by the training stimulus. Simulating or measuring this circuit yields time-series data for the port voltages $(v_1(t_j), v_2(t_j), v_3(t_j))_{j=1}^{N}$ and port currents $(i_1(t_j), i_2(t_j), i_3(t_j))_{j=1}^{N}$ where $\{t_j\}_{j=1}^{N} \subset [0, T]$. One may perform several experiments and aggregate the data, or perform a single comprehensive experiment and collect a single time-series; in this work we adopt the latter approach.

The standard parameter identification problem for (2) then is to produce a map $\varphi$ and matrices $A, B, C, D$ such that for a given input $u = (v_1, v_2, v_3)$, the error between the model output $y$ and the training output $(i_1, i_2, i_3)$ is minimized. A common choice, also used in this work, is to measure this error in the $L^2$ norm $\| \cdot \|$ on the time-interval $[0, T]$, i.e., we consider the following Mean-Squared Error (MSE) loss functional

$$J(y, i) = \frac{1}{2} \| y - (i_1, i_2, i_3) \|^2.$$  \hspace{1cm} (7)$$

However, as explained in the previous subsection, we construct the map $\varphi$ directly from DC sweep data using table-based piecewise trilinear interpolation and so, $\varphi$ does not include any free parameters. As a result, the only degrees-of-freedom that are left to be identified in our model are the matrices $A, B, C, D$ in (2). Let $n$ be a positive integer specifying the internal state dimension of the dynamic block. The number of inputs to this linear block is equal to six (three for $I_{DC}$ and three for $I_{DC}^p$ provided by $\varphi$), which yield matrix dimensions $A \in \mathbb{R}^{n \times n}, B \in \mathbb{R}^{n \times 6}, C \in \mathbb{R}^{3 \times n}, D \in \mathbb{R}^{3 \times 6}$.

To find these matrices we solve the following constrained optimization problem:

$$\text{Find } A, B, C, D \text{ to minimize } (7) \text{ subject to } (2).$$  \hspace{1cm} (8)$$

We call this model inference procedure sequential parameter identification because in contrast to conventional identification of Hammerstein models, the nonlinear block $\varphi$ is defined directly and independently from the linear block by using DC sweep data. As a result, the inference of (2) is effectively reduced to identification of an LTI system, which significantly simplifies the training of the model and reduces the amount of training data required. We reiterate that the appropriateness of this sequential parameter identification procedure stems from the fact that our model circuit is approximately memoryless.

Let us now discuss the second stage of this procedure, i.e., the identification of the matrices $A, B, C, D$ in (2) with particular emphasis on the generation of time-series training data for the optimization problem (8). We recall that this data should correspond to common operating conditions for our circuit, such as those described in Section III. To that end we consider a typical configuration of the CMOS amplifier obtained by connecting a 5 pF capacitive load to the output port at terminal 3, and two voltage sources to the gates at terminals 1 and 2. We then simulate the circuit in Xyce [17] using voltage sources producing time-varying waveforms given by the following frequency-modulated sinusoids (also known as exponential chirp signals [25]):

$$v_1(t) = V_{\text{bias}} + A \sin(\phi(t))$$
$$v_2(t) = V_{\text{bias}} - A \sin(\phi(t)),$$  \hspace{1cm} (9)$$

where the phase is given by

$$\phi(t) = 2\pi \int_{0}^{t} f_0^{1-\frac{\tau}{T}} f_1^{\frac{\tau}{T}} \, d\tau = 2\pi \frac{f_0^{1-\frac{f_1}{f_0}} - f_0^{-\frac{f_1}{f_0}}}{\ln(f_1) - \ln(f_0)} T.$$$$

The instantaneous frequency $f(t) = f_0^{1-\frac{\tau}{T}} f_1^{\frac{\tau}{T}}$ sweeps from $f_0$ at time 0 to $f_1$ at time $T$. The parameters are chosen according to the following:

- The high frequency $f_1 = 5$ GHz is chosen based on the intended operating regime of the circuit.
- The low frequency $f_0 = 100$ kHz is chosen sufficiently small to be dominated by characteristic time scale of the circuit; on the other hand, choosing a value closer to $f_1$ will necessitate fewer periods, reducing simulation time.
- The bias $V_{\text{bias}} = 2.5$ V is the nominal DC operating point given by the midpoint of $V_{DD}$ and ground.
- The amplitude $A = 50$ mV is large enough to draw out the frequency dependent bias shift and harmonic distortion without introducing exaggerated hard clipping.
In this section we compare simulation results for the CMOS differential amplifier in Figure 1 and a Hammerstein model constructed and identified according to Section IV. Similar to the generation of the training data, the simulation results for the circuit, which we consider as ground truth, are computed using Sandia’s in-house analog circuit simulator Xyce [17]. We start with a brief description of the model implementation.

A. Implementation and setup

To streamline and simplify the parameter identification process we have chosen to implement the reduced-order Hammerstein model in Python, with the transient simulations utilizing the Jax-based differential equations library Diffrax [18]. To identify the parameters \(A, B, C, D\) of the Hammerstein model (2) we solve the optimization problem (8) using a quasi-Newton reduced space approach [15]. We employ the limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS) [8] algorithm. The gradients of the loss function (7), required for the non-linear solver are computed using the automatic differentiation capability from Diffrax [18].

When presenting results we will discard the gate currents \(i_1\) and \(i_2\) because they are exactly zero (in the DC results) or nearly identically zero (in the transient results). Instead we only showcase the output voltage \(v_3\) and output current \(i_3\) resulting from (constant or time-varying) input voltages \(v_1\) and \(v_2\). The input terminals 1 and 2 will always be connected to independent source (in the DC results) or a 5 pF capacitive load (in the transient results).

B. DC results

In this subsection we assess the accuracy of the nonlinear input block \(\varphi\) of the Hammerstein model (2). For circuits that are approximately memoryless, this block provides a simplified version of (2) that is appropriate for relatively low-frequency signals. In such cases one can choose to run the Hammerstein model without the dynamic block.

The following results are derived from an iterated DC operating point analysis, which solves for the DC port currents \((I_1, I_2, I_3)\) when a given tuple of port voltages \((V_1, V_2, V_3)\) are applied. To that end, we define the DC sweep box \(D\) in (4) by setting \(V_{i}^{\min} = 0\) and \(V_{i}^{\max} = 5\) for \(i = 1, 2, 3\) and sweep each voltage with a step size of \(h = 100\) mV. This step corresponds to a sampling density \(K_i = 5001\) for all input voltages. Performing this DC analysis for the circuit in Figure 1 yields an implicit system of nonlinear algebraic equations, which is solved numerically in Xyce using Newton-Raphson iteration [17].

We use this data to construct the nonlinear map \(\varphi\) by piecewise trilinear interpolation as described in Section IV. We then report the DC output current \(I_3^{\phi}\) of the nonlinear block as a function of the two input voltages \(V_1\) and \(V_2\) applied to the gate terminals 1 and 2, respectively, while \(V_3\) is being held fixed. The accuracy of this current can be estimated using the error bound (6). Since we sample uniformly along each
direction of the DC sweep box (4), the diameter of each grid cell is \(d = \sqrt{3}h\), where \(h = 0.001\text{V}\). As a result, we have that
\[
|I^h_3 - I_3(V)| \leq \sup_{\bar{V} \in \bar{D}} |D^2I_3(\bar{V})| \times 10^{-6}.
\]
(10)
where \(\bar{D}\) is the mesh cell containing \(V\). A similar estimate holds for \(V_3\).

The plots in Figure 6 show the surfaces of \(I^h_3(V_1, V_2)\) for three different values of \(V_3\). Figure 7 uses a two-dimensional format to compare the related output voltage \(V^h_3(V_1, V_2)\) with the “ground truth” represented by the Xyce DC analysis simulation. The data from Xyce are in the solid pastel colors and the data from the Hammerstein model are in the bold dashed colors. The error plots at the bottom of Figure 7 confirm that the Hammerstein model is in excellent agreement with the transistor-level Xyce simulation across different circuit operational points. The error spikes in the plots correspond to the regions where the term \(|D^2I_3|\) in estimate (10) is large, i.e., the regions where \(V_3\) and \(I_3\) have large gradients. These spikes can be reduced by using non-uniform sampling that allocates more sampling points to these regions, while regions where \(I_3\) and \(V_3\) are “flat” are sampled more sparsely. Such an adaptive sampling strategy can also significantly reduce the memory cost of the trilinear interpolant. However, exploration of non-uniform sampling is beyond the scope of this paper.

Lastly, Figure 8 compares the output of the Hammerstein model with Xyce simulations when the amplifier is configured as a unity gain buffer by connecting the output terminal directly to the non-inverting input terminal, i.e., \(V_2 = V_3\). This plot illustrates the range of voltages over which the amplifier is approximately linear and does not introduce hard clipping. As before, the solid lines in these plots show the results from the Xyce simulations and the dashed lines show the results from our Hammerstein model.

C. Transient results

In this subsection we use a number of transient and alternating current (AC) simulations to compare predictions of the Hammerstein model with those of a transistor-level model of the circuit. We consider three instances of the model that all share the same input nonlinearity \(\varphi\) but have dynamic blocks with increasing internal state dimensions \(n = 1, 2, 3\). Our goal is to demonstrate that the accuracy of the model increases with the state dimension, i.e., that \(n\) can serve as a convenient “knob” to tune the quality of the transient simulations. To generate the “ground truth” data for these results we use Xyce to simulate the circuit with independent voltage sources connected to terminals 1 and 2 providing voltage waveforms \((v_1(t), v_2(t))\), and a 5 pF capacitive load connected to the output terminal. The voltage waveforms \((v_1(t), v_2(t))\) consist of various sinusoids at different frequencies, a square pulse, and the frequency-modulated training waveforms (9) defined in Section IV. The time-varying port currents \((i_1(t), i_2(t), i_3(t))\) are solved for within the simulation.

To generate the output of the Hammerstein model, we solve the closed-loop system of differential equations resulting from connecting the independent sources and capacitive load to the

\[\text{Fig. 6: Piecewise trilinear interpolant surfaces for DC output current } I^h_3 \text{ as a function of input voltages } (V_1, V_2) \text{ for a fixed output voltage } V_3.\]
The output voltage $V_3$ error is shown as a function of the input voltage $V_1$ for a fixed input voltage $V_2 \in \{1.5, 2.0, 3.5, 3.0, 3.5\}$ V.

The low, middle, and high frequency of the sinusoids are given for square wave and sinusoidal input voltages, respectively.

For comparison, the $I_3$ errors, $V_2 = 3.5$ V are shown. The plots in the Figure 9 compare and contrast the Hammerstein model with Xyce simulations of the circuit for inputs that have not been seen during the training process. Figure 10 showcases the results from an AC voltage analysis with a capacitive load attached to the output terminal. This analysis performs a linearization around the nominal operating point $V_1 = V_2 = 2.5$ V and $V_3$ is computed as a root of the equation

$$0 = I_{DC}(V_1, V_2, V_3).$$

The input voltage $v_2$ is held constant and the AC input voltage $v_1$ is a unit amplitude sine wave with variable frequency. The magnitude and phase of the resulting output voltage $v_3$ are shown. Notice that magnitude and phase responses between the circuit and the model begin to diverge for high frequencies after around 4 GHz; these data are absent in the training waveforms described in equation (9).

Figures 11 and 12 show the transient voltage and currents for square wave and sinusoidal input voltages, respectively. The low, middle, and high frequency of the sinusoids are
chosen to show the behavior before, near, and after the cutoff frequency of the amplifier when connected to the 5 pF capacitive load.

Recall that the training waveforms in (9) sweep a frequency range up to 5 GHz. Figure 13 examines what happens when the Hammerstein model is pushed outside this range. Specifically, we first evaluate the model on a sinusoid with input frequency \( f = 5 \) GHz, then on a sinusoid with input frequency \( f = 10 \) GHz which is double the maximum frequency seen by the model during the training process. We then compare predictions of the “full” Hammerstein models and its “truncated” version comprising just the static nonlinear block with Xyce simulations of the circuit. The plots in Figure 13a show that the accuracy of the “full” Hammerstein models remains reasonable for \( f = 5 \) GHz, while the “truncated” DC model clearly misses the phase shift and the amplitude degradation at this frequency. As before, the accuracy of the models is directly related to their state dimension and the model with \( n = 3 \) is the most accurate.

The plots in Figure 13b show that the distinctions between the models become even more pronounced at \( f = 10 \) GHz with the error gap between the smallest (\( n = 1 \)) and the largest (\( n = 3 \)) models opening up significantly, while the “truncated” model is lagging even more behind the Xyce simulation. While for \( n = 1 \) and \( n = 2 \) the Hammerstein models are clearly not accurate enough for circuit simulations, we note that for \( n = 3 \) the model is surprisingly acceptable considering that this high-frequency data was not present in the training stimulus.

VI. DISCUSSION AND CONCLUSIONS

In this paper we presented an alternative, non-intrusive system identification (SysID) MOR approach for common nonlinear analog circuit blocks such as operational and differential amplifiers, comparators, and voltage regulators that uses their characteristic behavior rather than their mathematical structure to select a model architecture. To that end, we developed and demonstrated an accurate and efficient reduced-order model for CMOS-based differential amplifier circuits that can be identified using only input-output data collected at the external ports. The model utilizes a parsimonious Hammerstein architecture formed by the series interconnection of a static nonlinearity and a linear time-invariant dynamical system. Our key contributions thus are the demonstration of SysID as an effective non-intrusive MOR approach for common nonlinear circuit blocks and the introduction of the sequential model inference approach for Hammerstein architectures as an effective training strategy that enables a streamlined model development workflow by simplifying training and reducing training data and training time. In particular, we use a combination of DC and transient circuit simulation input-output data to infer the static and dynamic blocks of the model, respectively.

Reproductive and predictive tests of Hammerstein models with state dimensions \( n = 1, 2, 3 \) reveal excellent fit with transistor-level simulations performed in Xyce. We note that these models comprise systems of just \( n = 1, 2, 3 \) ODEs whereas the original circuit requires solving for 24 independent solution variables constrained by a system of DAEs. We expect that comparable or better model compression factors can be achieved for other complex circuits with similar “scripted” behaviors but large device counts, and highly nonlinear mathematical descriptions such as op amps and comparators, that are difficult to handle by conventional, structure-dependent, intrusive MOR approaches.

For classes of circuits that are predominantly memoryless, the static nonlinear block captures the majority of the circuit behavior and the dynamic linear block provides a transient correction to account for reactive parasitic and loading effects. Therefore, it is important to reproduce the DC voltage–current characteristics of the circuit as accurately as possible. To accomplish this task in this paper we utilized trilinear interpolation constructed directly from simulation data sampled on a uniform Cartesian grid. We used this approach for simplicity, however, in practical implementation of the model a more efficient approach employing, e.g., adaptive and/or sparse sampling, should be considered to reduce the memory footprint of the static block. Alternatively, \( I - V \) characteristic surrogates employing compact parametric regression by, e.g., neural networks \([1]\), can be used to further reduce the memory requirements of the model at the cost of some additional offline training.

Since the DC characteristics alone are incapable of capturing frequency-dependent effects such as gain degradation and phase shift, the linear block of the Hammerstein model provides a transient correction to capture the circuit’s dynamic behavior at higher frequencies. The term \( I_{DC}^2 \) appended to \( I_{DC} \) within the nonlinearity \( \varphi \) was introduced to enable the model to reproduce additional nonlinear transient phenomena such as frequency-dependent bias shift and harmonic distortion that cannot be captured by a purely linear block relating \( I_{DC} \) and \( I_{tran} \). The choice of \( \varphi = (I_{DC}, I_{DC}^2) \) is generic in the sense that the only requirement is to produce sufficiently expressive “features” that can be subsequently filtered by the
linear block to fit the transient data. Extending this concept, we can augment the nonlinearity $\varphi$ with additional higher order monomials of $I_{DC}$, or use another dictionary of functions (e.g., rational functions, trigonometric polynomials, etc.) to add even more expressive features. For circuits that cannot be modeled accurately using a single Hammerstein model, developing training techniques for identifying composite Hammerstein models with series, parallel, and feedback interconnections is an interesting direction for follow-up study.

ACKNOWLEDGMENTS

This work was supported by the Sandia National Laboratories (SNL) Laboratory-directed Research and Development (LDRD) program, and the U.S. Department of Energy, Office of Science, and Office of Advanced Scientific Computing Research under Award Number DE-SC-0000230927.

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energys National Nuclear Security Administration contract number DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

REFERENCES


[16] Chung-Wen Ho, A. Ruchli, and P. Brennan. The modified nodal

(a) Transient output voltage $v_3$ for square wave input.

(b) Transient output current $i_3$ for square wave input.

Fig. 11: Comparison of transient output voltages $v_3$ and transient output currents $i_3$ between the circuit and Hammerstein models for square wave input voltage. The input voltage $v_1$ jumps between 2.45V and 2.55V with a ramp time of 10 ns (slew rate of 10V/µs); the input voltage $v_2$ is held at a constant 2.45V.
(a) Transient output voltage $v_3$ for input frequency $f = 10$ kHz.

(b) Transient output voltage $v_3$ for input frequency $f = 1$ MHz.

(c) Transient output voltage $v_3$ for input frequency $f = 100$ MHz.

(d) Transient output current $i_3$ for input frequency $f = 10$ kHz.

(e) Transient output current $i_3$ for input frequency $f = 1$ MHz.

(f) Transient output current $i_3$ for input frequency $f = 100$ MHz.

Fig. 12: Comparison of transient output voltages $v_3$ and transient output currents $i_3$ between the Xyce simulations of the circuit and the Hammerstein models for sinusoidal input voltages $v_1$ and $v_2$ with amplitude $A = 50$ mV, bias $V_{bias} = 2.5$ V, and frequencies (a,d) 10 kHz; (b,e) 1 MHz; and (c,f) 100 MHz.
Fig. 13: Comparison of the transient output currents $I_3$ between the Xyce simulations of the circuit and the Hammerstein models, including the DC current prediction $I_{DC}$, for high-frequency sinusoidal input voltages $v_1$ and $v_2$ with amplitude $A = 50$ mV and bias $V_{bias} = 2.5$ V. The model with state-dimension $n = 3$ is accurate up to 5 GHz, but the accuracies of all models suffer at 10 GHz which is outside the range of frequencies present in the training stimulus.

(a) Transient output current $I_3$ for input frequency $f = 5$ GHz.

(b) Transient output current $I_3$ for input frequency $f = 10$ GHz.


