Appearance
question:Maximum-likelihood parameter estimation in terahertz time-domain spectroscopy Laleh Mohtashemi, Paul Westlund, Derek G. Sahota, Graham B. Lea, Ian Bushfield, Payam Mousavi, and J. Steven Dodge Introduction At the heart of most applications of terahertz time-domain spectroscopy (THz-TDS) is a mathematical procedure that relates raw THz-TDS waveform measurements to parameters of scientific and technological interest. Typically this analysis is formulated in the frequency domain, since it provides the most natural description of any linear, time-invariant system of interest. But since a THz-TDS waveform describes the electric field as a function of time, it must be Fourier-transformed for use in any frequency-domain analysis. The Fourier transform scrambles the THz-TDS measurement noise, which is more naturally represented in the time domain, and produces artifacts that can degrade the overall measurement quality and yield misleading results. Furthermore, the standard approaches to parameter estimation in THz-TDS involve the ratio of two noisy waveforms, which further distorts the noise spectrum and can yield noise distributions that are far from Gaussian. Other approaches have emerged that improve on the standard procedures, but so far all of the existing approaches to THz-TDS analysis lack clear, model-independent statistical criteria for establishing parameter confidence intervals or for deciding whether a given physical model describes the data well or not. Here, we introduce a general framework to remedy this. We describe a maximum-likelihood approach to THz-TDS analysis in which both the signal and the noise are treated explicitly in the time domain, together with a frequency-domain constraint between the input and the output signal. We show that this approach produces superior results to existing analysis methods. Signal and noise in THz-TDS A Monte Carlo simulation of an elementary THz-TDS analysis illustrates the basic problems that our method solves; it also allows us to develop notational conventions that we will use to describe our main results. Figure 1(a) shows an ideal, noiseless THz-TDS waveform mu(t) normalized to its peak value ensuremath{mu_text{p}}, together with a time-domain Gaussian noise amplitude sigma_mu(t) given by sigma_mu^2(t) = ensuremath{sigma_alpha}^2 + [ensuremath{sigma_beta}mu(t)]^2 + left[ensuremath{sigma_tau}dot{mu}(t)right]^2 label{eq:noise-amplitude} with amplitudes ensuremath{sigma_alpha}= 10^{-4}ensuremath{mu_text{p}}, ensuremath{sigma_beta}= 10^{-2}, and ensuremath{sigma_tau}= 1~text{fs}. Physically, the additive noise term ensuremath{sigma_alpha} is produced by the detection electronics (in units of ensuremath{mu_text{p}}); the multiplicative noise term ensuremath{sigma_beta}|mu(t)| is produced by laser power fluctuations, which modulate the signal strength; and the jitter term ensuremath{sigma_tau}|dot{mu}(t)| approx |mu(t + ensuremath{sigma_tau})-mu(t)| is produced by fluctuations in the optical path lengths that control the delay between the terahertz pulse arrival time and the receiver sampling time. In this example the overall noise is dominated by ensuremath{sigma_beta}|mu(t)|, which produces the two peaks near t=2.75~text{ps} and t=3.50~text{ps}. The contribution from ensuremath{sigma_tau}|dot{mu}(t)| is less evident, except at t=3.10~text{ps}, where mu(t) crosses zero with a nonzero slope. The contribution from ensuremath{sigma_alpha} dominates at times when both mu(t) and dot{mu}(t) are small, although in the figure it is barely discernible from zero. Such structured, signal-dependent, time-domain noise is common in THz-TDS waveform measurements, and leads to well-known ambiguities in defining the signal-to-noise ratio and dynamic range for them. (a) Simulated time-domain signal (black solid line) and noise amplitude (gray dashed line). (b) Power spectral density (relative to peak) obtained from ten simulated time-domain measurements, with one shown in red and the rest shown in gray. We simulate a single THz-TDS waveform measurement by computing x(t_k) = mu(t_k) + sigma_mu(t_k)epsilon(t_{k}) label{eq:measured-waveform} at equally spaced times t_k = kT, k in {0, 1,ldots, N-1}, with N = 256 and T = 50~text{fs}, where each epsilon(t_k) is an independent random variable with a standard normal distribution. This sequence has the discrete Fourier transform (DFT) X(omega_l) = tilde{x}(omega_l) = tilde{mu}(omega_l) + [tilde{sigma}_muensuremath{circledast}tilde{epsilon}](omega_l) label{eq:measured-spectrum} at the discrete frequencies omega_l = 2pi l/(NT), l in {0, 1, ldots, N-1}, where tilde{x}(omega_l), tilde{sigma}_mu(omega_l), and tilde{epsilon}(omega_l) denote the DFTs of x(t_k), sigma_mu(t_k), and epsilon(t_k), respectively, and the ensuremath{circledast} symbol denotes circular discrete convolution. Figure 1(b) shows the power spectral density ensuremath{S_{xx}}(omega_l) = (T/N)|X(omega_l)|^2 at the unique frequencies given by l leq operatorname{floor}(N/2)=leftlfloor N/2rightrfloor for ten such simulations, where each spectrum is normalized to its peak value. The signal decreases exponentially with frequency, falling by a bit more than 40 dB from its peak power before reaching the noise floor near 5 THz. The red-highlighted spectrum in Fig. 1(b) shows that while the noise is relatively flat between 5 THz and 10 THz, it exhibits oscillatory fluctuations with a period of about 0.75 THz. These also appear in all of the other spectra in Fig. 1(b), and arise because the convolution [tilde{sigma}_muensuremath{circledast}tilde{epsilon}](omega_l) smooths the noise over a frequency scale given by the inverse width of sigma_mu(t). The resulting correlations blur the distinction between the signal and the noise, and their presence implies that the true uncertainty in X(omega_l) is significantly smaller than the noise floor suggested by ensuremath{S_{xx}}(omega_l), which neglects the information that noise at one frequency provides about the noise at neighbouring frequencies. Transfer function estimation in THz-TDS To measure the response of a system requires measurements of two THz-TDS waveforms: an input, mu(t), and an output, psi(t) = [hensuremath{*}mu](t), where h(t) is the impulse response of the system and the ensuremath{*} symbol denotes continuous-time convolution. Fourier transforming the ideal relationship psi(t) = [hensuremath{*}mu](t) converts the time-domain convolution to a frequency-domain multiplication, tilde{psi}_text{c}(omega) = H(omega)tilde{mu}_text{c}(omega), where tilde{mu}_text{c}(omega) and tilde{psi}_text{c}(omega) denote the continuous-time Fourier transforms of mu(t) and psi(t), respectively, and H(omega) = tilde{h}_text{c}(omega) is the continuous-time transfer function of the system. Proceeding as we did for a single waveform, we simulate input and output waveform measurements x(t_k) and y(t_k), respectively, by computing mu(t) and psi(t) at the discrete times t_k and adding time-domain noise sigma_mu(t_k)epsilon(t_{k}) and sigma_psi(t_k)delta(t_k), where each epsilon(t_k) and delta(t_k) is an independent random variable with a standard normal distribution. After applying the discrete Fourier transform to obtain X(omega_l) = tilde{x}(omega_l) and Y(omega_l) = tilde{y}(omega_l), we can determine the empirical transfer function estimate (ETFE), hat{H}_text{ETFE}(omega_l) = frac{Y(omega_l)}{X(omega_l)} = frac{tilde{psi}(omega_l) + [tilde{sigma}_psiensuremath{circledast}tilde{epsilon}](omega_l)}{tilde{mu}(omega_l) + [tilde{sigma}_muensuremath{circledast}tilde{delta}](omega_l)}, label{eq:ETFE} where tilde{mu}(omega_l) and tilde{psi}(omega_l) are the DFTs of the ideal input and output signals, respectively, that we would obtain in the absence of noise, and [tilde{sigma}_muensuremath{circledast}tilde{epsilon}](omega_l) and [tilde{sigma}_psiensuremath{circledast}tilde{delta}](omega_l) are the corresponding frequency-domain noise terms. The ETFE is a common starting point for THz-TDS analysis—it is easy to compute from the raw data, and it provides an estimate of H(omega) for any linear, time-invariant system. Frequently, though, the focus of interest is not H(omega) itself but a relatively small number of parameters that specify H(omega) through a physical model, such as the thickness and refractive index of a material. In this case, fitting the model directly to hat{H}_text{ETFE}(omega_l) can yield ambiguous results, because hat{H}_text{ETFE}(omega_l) is biased and has noise that is both correlated and non-Gaussian. To illustrate these deficiencies, Fig. 2 shows 250 simulations of hat{H}_text{ETFE}(omega_l) with nominally identical input and output pulses, corresponding to H(omega) = 1. The red-highlighted curves show correlations similar to those shown in Fig. 1, but that extend to frequencies where the signal is much stronger. The average over all simulations reveals the bias in hat{H}_text{ETFE}(omega_l): in Fig. 2(a), Re{hat{H}_text{ETFE}} falls from the correct value of 1 to the biased value of 0 as the frequency increases above 5 THz, where the signal reaches the noise floor in Fig. 1(b). Also, the noise distribution for hat{H}_text{ETFE}(omega_l) develops wide, non-Gaussian tails at frequencies where the signal is small, because noise fluctuation that cause |X(omega_l)|rightarrow 0 become more likely, and these in turn cause hat{H}_text{ETFE}(omega_l) to diverge. If the noise is uncorrelated, such large fluctuations are only expected when the signal is small. But in the more typical case that the noise is correlated, they can influence frequencies well within the primary signal bandwidth, as indicated by the red-highlighted curves in Figs. 2(c) and 2(d). Gray dots show the real (a,c) and imaginary parts (b,d) of the empirical transfer function estimate hat{H}_text{ETFE}, Eq. ([eq:ETFE]), for 250 pairs of simulated time-domain measurements of the waveform shown in Fig. 1(a). One estimate is highlighted in red, with the dots connected by a thin red line. The solid black line shows the average over all simulations. Panels (a,b) show the full bandwidth and (c,d) show the same data over the primary signal bandwidth. Figure 3 shows how these problems distort standard measures of fit quality. It displays the results of weighted least-squares fits to the ETFE simulations in Fig. 2 with the model H_1(omega; ensuremath{bm{theta}}) = theta_1exp(iomegatheta_2), label{eq:model1} which rescales the input by an amplitude theta_1 and shifts it by a delay theta_2 when we adopt the exp(-iomega t) convention for harmonic time dependence. For each fit, the estimated parameter vector ensuremath{bm{hat{theta}}} is chosen to minimize Q_text{ETFE}(ensuremath{bm{theta}}) = sum_{l = 0}^{N-1}frac{left|hat{H}_text{ETFE}(omega_l) - H_1(omega_l;ensuremath{bm{theta}})right|^2}{sigma_{omega_l}^2}, label{eq:ETFE-cost} where each sigma_{omega_l}^2 = (text{Var}{Re[hat{H}_text{ETFE}(omega_l)]} +text{Var}{Im[hat{H}_text{ETFE}(omega_l)]}) is determined from the Monte Carlo samples. These fits return accurate estimates for the model parameters—over all simulated samples, we obtain hat{theta}_1= 1.000pm 0.005 and hat{theta}_2 = (0.0pm 0.8)~text{fs}, which are consistent with the true values, theta_1=1 and theta_2=0, used in the simulation. But the normalized fit residuals, given by r_text{ETFE}(omega_l; ensuremath{bm{hat{theta}}}) = frac{hat{H}_text{ETFE}(omega_l) - H_1(omega_l;ensuremath{bm{hat{theta}}})}{sigma_{omega_l}}, label{eq:ETFE-residuals} show strong deviations from the standard normal distribution, exhibit structure that could easily be mistaken for real physical features, and yield a goodness of fit (GOF) statistic S_text{ETFE} = Q_text{ETFE}(ensuremath{bm{hat{theta}}}) that looks nothing like the chi^2 distribution that we would normally use to evaluate the fit quality. Also, the uncertainty estimates, hat{sigma}_{theta_1} = 0.001 and hat{sigma}_{theta_2} = 0.2~text{fs}, obtained by the usual method of inverting the curvature matrix of Q_text{ETFE}(ensuremath{bm{theta}}) around the minimum of each fit, significantly underestimate the values that are actually observed over the set of Monte Carlo samples, sigma_{theta_1} = 0.005 and sigma_{theta_2} = 0.8~text{fs}. In short, while an ETFE fit with Eq. ([eq:ETFE-cost]) may provide accurate parameter estimates when the underlying model is known in advance, it offers poor discrimination between good models and bad ones, and it yields parameter uncertainty estimates that are unrealistically precise. Measures of fit quality for ETFE fits with Eq. ([eq:model1]), obtained by minimizing Eq. ([eq:ETFE-cost]). (a) Cumulative distribution of the GOF statistic, S_text{ETFE}, for the Monte Carlo simulations shown in Fig. 2. The chi^2 distribution for the same number of degrees of freedom (nu = 254) is shown for comparison. The red times marks a fit with S_text{ETFE} approx 180, which is just above the median value. The normalized residuals for this fit, r_text{ETFE}(omega_l; ensuremath{bm{hat{theta}}}), are shown in (b) as a function of frequency, and in the inset to (a) with a normal probability plot (black dots, which represent both the real and the imaginary parts of r_text{ETFE}). The gray dash-dotted line in the inset represents the standard normal distribution. An alternative approach is to use a least-squares (LS) procedure that minimizes the sum of the squared differences between the output signal and the transformed input signal, begin{aligned} Q_text{LS}(ensuremath{bm{theta}}) &= sum_{l=0}^{N-1}left|Y(omega_l) - H(omega_l;ensuremath{bm{theta}})X(omega_l)right|^2 = sum_{k=0}^{N-1}left{y(t_k) - [h(ensuremath{bm{theta}})ensuremath{circledast}x](t_k)right}^2. label{eq:LS-cost}end{aligned} Here, h(t_k; ensuremath{bm{theta}}) is the model impulse response, equal to the inverse DFT of H(omega_l;ensuremath{bm{theta}}), and [h(ensuremath{bm{theta}})ensuremath{circledast}x](t_k) represents the convolution of impulse response with the input vector at the time t_k. The equivalence between the frequency-domain sum and the time-domain sum is assured by Parseval’s theorem. The LS procedure avoids the division by X(omega_l) that distorts the noise distribution of the ETFE in the small-signal limit, and the time-domain residuals r_text{LS}(t_k; ensuremath{bm{hat{theta}}}) = y(t_k) - [h(ensuremath{bm{theta}})ensuremath{circledast}x](t_k) are a sensitive indicator of fit quality. But the uniform weighting in Eq. ([eq:LS-cost]) implicitly assumes that this noise is constant in time, which as we have noted is frequently not the case. In principle, we should be able to account for time-dependent noise by assigning appropriate weights to the sum, but this is not as simple as it might seem. The problem is that Eq. ([eq:LS-cost]) treats the input and output noise asymmetrically, which we can see more clearly if we express it explicitly in terms of the time-domain signal and noise sequences: Q_text{LS}(ensuremath{bm{theta}}) = sum_{k=0}^{N-1}left[psi(t_k) - [h(ensuremath{bm{theta}})ensuremath{circledast}mu](t_k) + sigma_psi(t_k) - [h(ensuremath{bm{theta}})ensuremath{circledast}(sigma_muepsilon)](t_k)right]^2. label{eq:LS-cost-noise} This asymmetry results from the implicit assumption that all noise is restricted to the output variable in an LS fit, so that the term [h(ensuremath{bm{theta}})ensuremath{circledast}(sigma_muepsilon)](t_k) can be ignored. When input noise is present—as it always is in THz-TDS—the optimal fit weights will depend not just on the measurement noise sequences but also on h(t_k; ensuremath{bm{theta}}). Any modification of Eq. ([eq:LS-cost]) with constant weights will generically bias the parameter estimates toward values that minimize the input noise contribution, and will cause the GOF statistic, S_text{LS} = Q_text{LS}(ensuremath{bm{hat{theta}}}), to have a distribution that also depends on h(t_k; ensuremath{bm{theta}}). As we discuss below, these problems can all be solved with a fit procedure derived from the maximum-likelihood principle. Maximum-likelihood estimation of a parameterized transfer function model The likelihood function is equivalent to the probability density for obtaining the measured data under the assumptions of a given model, and forms the theoretical basis for the standard least-squares fit procedure. To define it in the THz-TDS context, we express the measured input and output signals in vector notation as ensuremath{bm{x}}= [x(t_0), x(t_1), ldots, x(t_{N-1})]^ensuremath{mathsf{T}} and ensuremath{bm{y}}= [y(t_0), y(t_1), ldots, y(t_{N-1})]^ensuremath{mathsf{T}}, respectively, and assume that they represent noisy measurements of bandlimited but otherwise unknown ideal signal vectors ensuremath{bm{mu}}= [mu(t_0), mu(t_1), ldots, mu(t_{N-1})]^ensuremath{mathsf{T}} and ensuremath{bm{psi}}= [psi(t_0), psi(t_1), ldots, psi(t_{N-1})]^ensuremath{mathsf{T}} with noise covariance matrices ensuremath{bm{Sigma_x}} and ensuremath{bm{Sigma_y}}, respectively. We further assume that ensuremath{bm{mu}} and ensuremath{bm{psi}} satisfy the linear constraint ensuremath{bm{psi}}= ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{mu}}, where ensuremath{bm{h}} is a known matrix function of an unknown p-dimensional parameter vector ensuremath{bm{theta}}. The likelihood function is then a product of two multivariate Gaussian distributions, L(ensuremath{bm{theta}}, ensuremath{bm{mu}}, ensuremath{bm{psi}}; ensuremath{bm{x}}, ensuremath{bm{y}}) = frac{exp[-frac{1}{2}(ensuremath{bm{x}}- ensuremath{bm{mu}})^ensuremath{mathsf{T}},ensuremath{bm{Sigma_x}}^{-1}(ensuremath{bm{x}}-ensuremath{bm{mu}})]}{sqrt{(2pi)^Ndet(ensuremath{bm{Sigma_x}})}}frac{exp[-frac{1}{2}(ensuremath{bm{y}}- ensuremath{bm{psi}})^ensuremath{mathsf{T}},ensuremath{bm{Sigma_y}}^{-1}(ensuremath{bm{y}}-ensuremath{bm{psi}})]}{sqrt{(2pi)^Ndet(ensuremath{bm{Sigma_y}})}}, label{eq:likelihood-tf} where the dependence on ensuremath{bm{theta}} on the right side of the expression is implicit in the constraint between ensuremath{bm{mu}} and ensuremath{bm{psi}}. The signal vectors ensuremath{bm{mu}} and ensuremath{bm{psi}} appear as arguments in the likelihood function but are otherwise unimportant—the statistical literature aptly describes these as nuisance parameters, which we must eliminate to estimate ensuremath{bm{theta}}, the parameters of interest. We can use discrete-time Fourier analysis to obtain explicit expressions for ensuremath{bm{h}}(ensuremath{bm{theta}}), ensuremath{bm{Sigma_x}}, and ensuremath{bm{Sigma_y}}. The time-domain transfer matrix ensuremath{bm{h}}(ensuremath{bm{theta}}) is h_{jk}(ensuremath{bm{theta}}) = frac{1}{N}sum_{l = -N/2 + 1}^{N/2-1} H(omega_l; ensuremath{bm{theta}}) exp[-2pi i(j - k)l/N] + frac{1}{N}Re[H(omega_{N/2};ensuremath{bm{theta}})]cos[pi(j-k)], label{eq:transfer-matrix} where omega_l now extends to negative values of l, and N is assumed even; for odd N, the sum runs from l = -(N-1)/2 to (N-1)/2 and the anomalous term at the Nyquist frequency is absent. Similarly, we can define the discrete-time matrix representation of the time-derivative operator, ensuremath{bm{D}}, as D_{jk} = frac{1}{N}sum_{l = -lfloor (N-1)/2 rfloor}^{lfloor (N-1)/2 rfloor} -iomega_l exp[-2pi i(j - k)l/N], label{eq:D-matrix} by recognizing that the transfer function for the continuous time-derivative operator is H(omega) = -iomega. Note that Eq. ([eq:D-matrix]) is valid for all values of N because Re[-iomega_{N/2}]=0, so the anomalous term in Eq. ([eq:transfer-matrix]) is zero. Equation ([eq:D-matrix]) allows us to express the noise variance function sigma_mu^2(t) in Eq. ([eq:noise-amplitude]) as a matrix function in discrete time, left[ensuremath{bm{V}}(ensuremath{bm{mu}})right]_{jk} = delta_{jk}left[ensuremath{sigma_alpha}^2 + ensuremath{sigma_beta}^2mu_{k}^2 + ensuremath{sigma_tau}^2(ensuremath{bm{D}}ensuremath{bm{mu}})_{k}^2right]. label{eq:V-matrix} The covariance matrices for ensuremath{bm{x}} and ensuremath{bm{y}} are then ensuremath{bm{Sigma_x}}= ensuremath{bm{V}}(ensuremath{bm{mu}}), quadensuremath{bm{Sigma_y}}= ensuremath{bm{V}}(ensuremath{bm{psi}}). label{eq:Sigma-xy} The maximum-likelihood (ML) estimate for the model parameters is given by the vectors ensuremath{bm{hat{mu}}}, ensuremath{bm{hat{psi}}}, and ensuremath{bm{hat{theta}}} that maximize L subject to the constraint ensuremath{bm{psi}}= ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{mu}}. Alternatively, we can minimize the negative-log likelihood, begin{split} -ln L(ensuremath{bm{theta}}, ensuremath{bm{mu}}, ensuremath{bm{psi}}; ensuremath{bm{x}}, ensuremath{bm{y}}) &= Nln(2pi) + frac{1}{2}ln{det[ensuremath{bm{V}}(ensuremath{bm{mu}})]} + frac{1}{2}ln{det[ensuremath{bm{V}}(ensuremath{bm{psi}})]} &quad+ frac{1}{2}left{(ensuremath{bm{x}}- ensuremath{bm{mu}})^ensuremath{mathsf{T}},left[ensuremath{bm{V}}(ensuremath{bm{mu}})right]^{-1}(ensuremath{bm{x}}- ensuremath{bm{mu}}) + (ensuremath{bm{y}}- ensuremath{bm{psi}})^ensuremath{mathsf{T}},left[ensuremath{bm{V}}(ensuremath{bm{psi}})right]^{-1}(ensuremath{bm{y}}- ensuremath{bm{psi}})right}, end{split} label{eq:NLL} also subject to the constraint, where the last term now has the familiar least-squares form. The dependence of the covariance matrices on the signal vectors prevents us from using a standard least-squares optimization algorithm to minimize Eq. ([eq:NLL]), but to a very good approximation we can substitute ensuremath{bm{V}}(ensuremath{bm{mu}}) approx ensuremath{bm{V}}(ensuremath{bm{x}}) and ensuremath{bm{V}}(ensuremath{bm{psi}}) approx ensuremath{bm{V}}(ensuremath{bm{y}}) to obtain begin{split} -ln L(ensuremath{bm{theta}}, ensuremath{bm{mu}}, ensuremath{bm{psi}}; ensuremath{bm{x}}, ensuremath{bm{y}}) &approx Nln(2pi) + frac{1}{2}ln{det[ensuremath{bm{V}}(ensuremath{bm{x}})]} + frac{1}{2}ln{det[ensuremath{bm{V}}(ensuremath{bm{y}})]} &quad+ frac{1}{2}left{(ensuremath{bm{x}}- ensuremath{bm{mu}})^ensuremath{mathsf{T}},left[ensuremath{bm{V}}(ensuremath{bm{x}})right]^{-1}(ensuremath{bm{x}}- ensuremath{bm{mu}}) + (ensuremath{bm{y}}- ensuremath{bm{psi}})^ensuremath{mathsf{T}},left[ensuremath{bm{V}}(ensuremath{bm{y}})right]^{-1}(ensuremath{bm{y}}- ensuremath{bm{psi}})right}. end{split} label{eq:NLL-approximate} The first three terms are now all constant, so we can focus on minimizing the last term, which we multiply by two to obtain a cost function in what is known as a total least-squares form, tilde{Q}_text{TLS}(ensuremath{bm{theta}}, ensuremath{bm{mu}}, ensuremath{bm{psi}}) = (ensuremath{bm{x}}- ensuremath{bm{mu}})^ensuremath{mathsf{T}},[ensuremath{bm{V}}(ensuremath{bm{x}})]^{-1}(ensuremath{bm{x}}- ensuremath{bm{mu}}) + (ensuremath{bm{y}}- ensuremath{bm{psi}})^ensuremath{mathsf{T}},[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1}(ensuremath{bm{y}}- ensuremath{bm{psi}}), label{eq:TLS-cost-tilde} which is also subject to the constraint ensuremath{bm{psi}}= ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{mu}}. Note that the total least-squares cost function treats the input and output variables symmetrically, unlike the conventional least-squares cost function in Eq. ([eq:LS-cost]). Geometrically, it can be interpreted as the sum of the squared distances between each measurement point (x_k, y_k) and its associated model point (mu_k, psi_k), using the metric tensors ensuremath{bm{V}}(ensuremath{bm{x}}) and ensuremath{bm{V}}(ensuremath{bm{y}}), respectively, for the input and output variables. As discussed in Sec. 3, the LS cost function in Eq. ([eq:LS-cost]) focuses entirely on the deviations in the output variable. Introducing an N-dimensional vector of Lagrange parameters ensuremath{bm{lambda}} to implement the constraint, we can define the modified cost function, tilde{tilde{Q}}_text{TLS}(ensuremath{bm{theta}}, ensuremath{bm{mu}}, ensuremath{bm{psi}}, ensuremath{bm{lambda}}) = ensuremath{bm{lambda}}^ensuremath{mathsf{T}}left[ensuremath{bm{psi}}- ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{mu}}right] + tilde{Q}_text{TLS}(ensuremath{bm{mu}}, ensuremath{bm{psi}}, ensuremath{bm{theta}}), label{eq:TLS-cost-double-tilde} which we may now minimize with respect to ensuremath{bm{mu}}, ensuremath{bm{psi}}, ensuremath{bm{theta}}, and ensuremath{bm{lambda}}. The parameters of interest are ensuremath{bm{theta}}; minimizing with respect to the remaining parameters gives the equations begin{aligned} left.frac{partial tilde{tilde{Q}}_text{TLS}}{partial lambda_m}right|_{ensuremath{bm{hat{mu}}},ensuremath{bm{hat{psi}}},ensuremath{bm{hat{lambda}}},ensuremath{bm{theta}}} &= ensuremath{hat{psi}}_m - sum_{l=1}^{N} h_{ml}(ensuremath{bm{theta}})ensuremath{hat{mu}}_l = 0, left.frac{partial tilde{tilde{Q}}_text{TLS}}{partial psi_m}right|_{ensuremath{bm{hat{mu}}},ensuremath{bm{hat{psi}}},ensuremath{bm{hat{lambda}}},ensuremath{bm{theta}}} &= ensuremath{hat{lambda}}_m - 2sum_{l=1}^{N}left{[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1}right}_{ml}(y_l-ensuremath{hat{psi}}_l) = 0, left.frac{partial tilde{tilde{Q}}_text{TLS}}{partial mu_m}right|_{ensuremath{bm{hat{mu}}},ensuremath{bm{hat{psi}}},ensuremath{bm{hat{lambda}}},ensuremath{bm{theta}}} &= - sum_{l=1}^{N}ensuremath{hat{lambda}}_lh_{lm}(ensuremath{bm{theta}}) - 2sum_{l=1}^{N}left{[ensuremath{bm{V}}(ensuremath{bm{x}})]^{-1}right}_{ml}(x_l-ensuremath{hat{mu}}_l) = 0,end{aligned} which have solutions begin{aligned} ensuremath{bm{hat{psi}}}_{ensuremath{bm{theta}}} &= ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{hat{mu}}}_{ensuremath{bm{theta}}},label{eq:psi-hat} ensuremath{bm{hat{lambda}}}_{ensuremath{bm{theta}}} &= 2[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1}(ensuremath{bm{y}}- ensuremath{bm{hat{psi}}}_{ensuremath{bm{theta}}}),label{eq:lambda-hat} ensuremath{bm{hat{mu}}}_{ensuremath{bm{theta}}} &= left{bm{1} + ensuremath{bm{V}}(ensuremath{bm{x}})left[ensuremath{bm{h}}(ensuremath{bm{theta}})right]^ensuremath{mathsf{T}}[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1}ensuremath{bm{h}}(ensuremath{bm{theta}})right}^{-1}left{ensuremath{bm{x}}+ ensuremath{bm{V}}(ensuremath{bm{x}})left[ensuremath{bm{h}}(ensuremath{bm{theta}})right]^ensuremath{mathsf{T}}[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1}ensuremath{bm{y}}right}.label{eq:mu-hat}end{aligned} Evaluating tilde{tilde{Q}}_text{TLS} at ensuremath{bm{mu}}= ensuremath{bm{hat{mu}}}_{ensuremath{bm{theta}}}, ensuremath{bm{psi}}= ensuremath{bm{hat{psi}}}_{ensuremath{bm{theta}}}, and ensuremath{bm{lambda}}= ensuremath{bm{hat{lambda}}}_{ensuremath{bm{theta}}} yields a simplified cost function that involves only the parameter vector ensuremath{bm{theta}} and the data vectors ensuremath{bm{x}} and ensuremath{bm{y}}, Q_text{TLS}(ensuremath{bm{theta}}) = (ensuremath{bm{x}}- ensuremath{bm{hat{mu}}}_{ensuremath{bm{theta}}})^ensuremath{mathsf{T}},[ensuremath{bm{V}}(ensuremath{bm{x}})]^{-1}(ensuremath{bm{x}}- ensuremath{bm{hat{mu}}}_{ensuremath{bm{theta}}}) + (ensuremath{bm{y}}- ensuremath{bm{hat{psi}}}_{ensuremath{bm{theta}}})^ensuremath{mathsf{T}},[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1}(ensuremath{bm{y}}- ensuremath{bm{hat{psi}}}_{ensuremath{bm{theta}}}). label{eq:TLS-cost-xy} We can simplify these expressions further by defining ensuremath{bm{U}}(ensuremath{bm{x}}, ensuremath{bm{theta}}) = ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{V}}(ensuremath{bm{x}})left[ensuremath{bm{h}}(ensuremath{bm{theta}})right]^ensuremath{mathsf{T}}. label{eq:Uy} Substituting Eq. ([eq:Uy]) in Eqs. ([eq:mu-hat]) and ([eq:psi-hat]) reveals that ensuremath{bm{hat{psi}}}_{ensuremath{bm{theta}}} is just the weighted average of ensuremath{bm{y}} and ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}, ensuremath{bm{hat{psi}}}_{ensuremath{bm{theta}}} = left{[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1} + [ensuremath{bm{U}}(ensuremath{bm{x}},ensuremath{bm{theta}})]^{-1}right}^{-1}left{[ensuremath{bm{V}}(ensuremath{bm{y}})]^{-1}ensuremath{bm{y}}+ [ensuremath{bm{U}}(ensuremath{bm{x}},ensuremath{bm{theta}})]^{-1}[ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}]right}, label{eq:psi-hat-weighted-average} where the matrices ensuremath{bm{U}}(ensuremath{bm{x}},ensuremath{bm{theta}}) and ensuremath{bm{V}}(ensuremath{bm{y}}) are approximately equal to the true (but unknown) covariance matrices Cov[ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}] = ensuremath{bm{U}}(ensuremath{bm{mu}},ensuremath{bm{theta}}) and Cov(ensuremath{bm{y}}) = ensuremath{bm{V}}(ensuremath{bm{psi}}), respectively. We emphasize here that although ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{mu}}= ensuremath{bm{psi}}, the covariance matrices Cov[ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}] and Cov(ensuremath{bm{y}}) are generally different, since ensuremath{bm{h}}(ensuremath{bm{theta}}) transforms the noise in ensuremath{bm{x}} as well as the signal. For example, if ensuremath{bm{h}}(A) = Aensuremath{bm{I}}, Aneq 1, and the noise is purely additive, we have Cov(ensuremath{bm{y}}) = ensuremath{sigma_alpha}^2ensuremath{bm{I}}neqCov[ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}] = A^2ensuremath{sigma_alpha}^2ensuremath{bm{I}}. Substituting Eq. ([eq:Uy]) in Eq. ([eq:TLS-cost-xy]) yields Q_text{TLS}(ensuremath{bm{theta}}) = left[ensuremath{bm{y}}- ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}right]^ensuremath{mathsf{T}}left[ensuremath{bm{V}}(ensuremath{bm{y}}) + ensuremath{bm{U}}(ensuremath{bm{x}}, ensuremath{bm{theta}})right]^{-1}left[ensuremath{bm{y}}- ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}right]. label{eq:TLS-cost-deviations} Like the simpler LS cost function in Eq. ([eq:LS-cost]), the TLS cost function in Eq. ([eq:TLS-cost-deviations]) is expressed in terms of the deviations ensuremath{bm{y}}- ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}, but is now properly normalized with respect to the associated covariance matrix, ensuremath{bm{V}}(ensuremath{bm{y}}) + ensuremath{bm{U}}(ensuremath{bm{x}}, ensuremath{bm{theta}}). Introducing the matrix square root, bm{A}^{1/2} = bm{X}leftrightarrow bm{A} = bm{X}^2, we may define the normalized residual vector as ensuremath{bm{r}}_text{TLS}(ensuremath{bm{theta}}) = left[ensuremath{bm{V}}(ensuremath{bm{y}}) + ensuremath{bm{U}}(ensuremath{bm{x}}, ensuremath{bm{theta}})right]^{-1/2}left[ensuremath{bm{y}}- ensuremath{bm{h}}(ensuremath{bm{theta}})ensuremath{bm{x}}right], label{eq:TLS-residuals} which allows us to express the TLS cost function in the compact form, Q_text{TLS}(ensuremath{bm{theta}}) = [ensuremath{bm{r}}_text{TLS}(ensuremath{bm{theta}})]^ensuremath{mathsf{T}}ensuremath{bm{r}}_text{TLS}(ensuremath{bm{theta}}). label{eq:TLS-cost-residuals} The TLS estimate for the parameter vector, ensuremath{bm{hat{theta}}}, may now be determined by minimizing Q_text{TLS}(ensuremath{bm{theta}}) using a standard least-squares optimization algorithm. Figure 4 shows fit results for the same model and data shown in Fig. 3, but obtained by minimizing Q_text{TLS}(ensuremath{bm{theta}}) in Eq. ([eq:TLS-cost-residuals]) instead of Q_text{ETFE}(ensuremath{bm{theta}}) in Eq. ([eq:ETFE-cost]). The statistical properties obtained with Q_text{TLS}(ensuremath{bm{theta}}) are clearly superior to those with Q_text{ETFE}(ensuremath{bm{theta}}). The GOF statistic, S_text{TLS} = Q_text{TLS}(ensuremath{bm{hat{theta}}}), approximates a chi^2 distribution with nu = N-p degrees of freedom, as expected. The normalized residual vector ensuremath{bm{r}}_text{TLS}(ensuremath{bm{hat{theta}}}), shown for one of the fits, exhibits a standard normal distribution with no discernible correlations among neighboring time points. The distribution for S_text{TLS} is more than 7 times narrower than the distribution for S_text{ETFE}, making it that much more sensitive to a poor fit. Similarly, the lack of structure in ensuremath{bm{r}}_text{TLS}(ensuremath{bm{hat{theta}}}) makes it more useful for residual analysis than r_text{ETFE}(omega_l; ensuremath{bm{hat{theta}}}), since structure associated with poor fits may be discerned more readily. Finally, unlike the ETFE fits, the TLS fits yield estimates for the parameter uncertainties that agree with the values observed over the set of Monte Carlo samples, sigma_{theta_1} = 0.002 and sigma_{theta_2} = 0.4~text{fs}, which are both about a factor of 2 smaller than the values observed in the ETFE parameter estimates. In summary, when compared with the standard ETFE fit procedure, the TLS fit procedure offers better discrimination between good models and bad ones, more precise values for the parameters, and more accurate estimates of the parameter uncertainties. Measures of fit quality for TLS fits with Eq. ([eq:model1]), obtained by minimizing Eq. ([eq:TLS-cost-residuals]); compare with Fig. 3. (a) Cumulative distribution of the GOF statistic, S_text{TLS}, for the Monte Carlo simulations shown in Fig. 2. The chi^2 distribution for the same number of degrees of freedom (nu = 254) is shown for comparison. The red times marks a fit with S_text{TLS} approx 252, which is just above the median value. The normalized residuals for this fit, ensuremath{bm{r}}_text{TLS}(ensuremath{bm{hat{theta}}}), are shown in (b) as a function of time, and in the inset to (a) with a normal probability plot (black dots). The gray dash-dotted line in the inset represents the standard normal distribution. Maximum-likelihood estimation of the noise model We have assumed so far that the noise amplitudes sigma_alpha, sigma_beta, and sigma_tau in Eq. ([eq:V-matrix]) are known, but in general they must also be determined experimentally. One way to do this is to measure the noise at three different points on the THz waveform: as we saw in Fig. 1, the sigma_beta term dominates near the peak, the sigma_tau term dominates where the signal crosses zero with a large slope, and the sigma_alpha term dominates where both the signal and its slope are small. Another approach is to determine the variance as a function of time from a set of repeated waveform measurements, then obtain the noise parameters from a fit with Eq. ([eq:noise-amplitude]). But both of these methods are susceptible to systematic errors from drift, which produces excess signal variability over the timescale of multiple measurements. In this section, we describe a likelihood method for estimating the noise parameters that accounts for this drift. We consider repeated measurements of an ideal primary waveform, mu(t), which has an amplitude and a delay that drift on a timescale longer than the waveform acquisition time. We can then associate each measurement with an ideal secondary waveform, zeta(t; A_l, eta_l) = A_lmu(t - eta_l), label{eq:zeta} where A_l is the relative amplitude, eta_l is the delay, and l in {0, 1, ldots, M-1}. We also set A_0 = 1 and eta_0 = 0, so that zeta(t; A_0, eta_0) = mu(t). As before, we sample these waveforms at the nominal times t_k = kT, k in {0, 1,ldots, N-1} to obtain the ideal primary waveform vector ensuremath{bm{mu}}= [mu_0, mu_1, ldots, mu_{N-1}]^ensuremath{mathsf{T}} and M measurement vectors ensuremath{bm{x}}_l = [x_l(t_0), x_l(t_1), ldots, x_l(t_{N-1})]^ensuremath{mathsf{T}}, which we can arrange in an Ntimes M matrix ensuremath{bm{X}}= [ensuremath{bm{x}}_0, ensuremath{bm{x}}_1, ldots, ensuremath{bm{x}}_{N-1}]. We also write the amplitudes and delays in vector form, ensuremath{bm{A}}= [A_0, A_1, ldots, A_{M-1}]^ensuremath{mathsf{T}} and ensuremath{bm{eta}}= [eta_0, eta_1, ldots, eta_{M-1}]^ensuremath{mathsf{T}}, respectively. With these definitions, we can express the sampled secondary waveforms in vector form, ensuremath{bm{zeta}}_l = ensuremath{bm{zeta}}(ensuremath{bm{mu}}, A_l, eta_l) = A_lensuremath{bm{S}}(eta_l)ensuremath{bm{mu}}, label{eq:zetavec} where the matrix ensuremath{bm{S}}(eta_l) is defined to impart a delay by eta_l. Using Eq. ([eq:transfer-matrix]) with the transfer function H(omega; eta) = exp(iomegaeta), we can write this matrix explicitly as S_{jk}(eta) = frac{1}{N}sum_{l = -N/2+1}^{N/2-1} exp(iomega_leta) exp[-2pi i(j - k)l/N] + frac{1}{N}cos(omega_{N/2}eta)cos[pi(j-k)] label{eq:shift-matrix} for even N, with changes for odd N as described for Eq. ([eq:transfer-matrix]). Following Eqs. ([eq:V-matrix]) and ([eq:Sigma-xy]), and arranging the secondary waveform vectors in an Ntimes M matrix ensuremath{bm{Z}}= [ensuremath{bm{zeta}}_0, ensuremath{bm{zeta}}_1,ldots,ensuremath{bm{zeta}}_{M-1}], we also have _{jk} = [ensuremath{bm{V}}(ensuremath{bm{zeta}}_l; ensuremath{sigma_alpha}, ensuremath{sigma_beta}, ensuremath{sigma_tau})]_{jk} = delta_{jk}left[ensuremath{sigma_alpha}^2 + ensuremath{sigma_beta}^2 Z_{kl}^2 + ensuremath{sigma_tau}^2(ensuremath{bm{D}}ensuremath{bm{Z}})_{kl}^2right], label{eq:covariance-zeta} where ensuremath{bm{D}} is defined in Eq. ([eq:D-matrix]), ensuremath{bm{zeta}}_l and ensuremath{bm{Z}} depend implicitly on ensuremath{bm{A}} and ensuremath{bm{eta}}, and the dependence of ensuremath{bm{V}} on the noise amplitudes is now expressed explicitly. The likelihood function for observing ensuremath{bm{X}} under these assumptions is L(sigma_alpha,sigma_beta,sigma_tau,ensuremath{bm{mu}},ensuremath{bm{A}},ensuremath{bm{eta}};ensuremath{bm{X}}) = prod_{l=0}^{M-1} frac{expleft{-frac{1}{2}(ensuremath{bm{x}}_l - ensuremath{bm{zeta}}_l)^ensuremath{mathsf{T}}[ensuremath{bm{V}}(ensuremath{bm{zeta}}_l; ensuremath{sigma_alpha}, ensuremath{sigma_beta}, ensuremath{sigma_tau})]^{-1}(ensuremath{bm{x}}_l - ensuremath{bm{zeta}}_l)right}}{sqrt{(2pi)^Ndet[ensuremath{bm{V}}(ensuremath{bm{zeta}}_l; ensuremath{sigma_alpha}, ensuremath{sigma_beta}, ensuremath{sigma_tau})]}}, label{eq:likelihood-noise} in which the noise amplitudes ensuremath{sigma_alpha}, ensuremath{sigma_beta} and ensuremath{sigma_tau} are the parameters of interest and the signal vectors ensuremath{bm{mu}}, ensuremath{bm{A}} and ensuremath{bm{eta}} are nuisance parameters. As with Eq. ([eq:likelihood-tf]), it is more convenient computationally to work with the negative-log likelihood, begin{split} -ln[L(sigma_alpha,sigma_beta,sigma_tau,ensuremath{bm{mu}},ensuremath{bm{A}},ensuremath{bm{eta}};ensuremath{bm{X}})] &= frac{MN}{2}ln(2pi) + frac{1}{2}sum_{l=0}^{M-1} lnleft{det[ensuremath{bm{V}}(ensuremath{bm{zeta}}_l; ensuremath{sigma_alpha}, ensuremath{sigma_beta}, ensuremath{sigma_tau})]right} &+ frac{1}{2}sum_{l=0}^{M-1} left{(ensuremath{bm{x}}_l - ensuremath{bm{zeta}}_l)^ensuremath{mathsf{T}}[ensuremath{bm{V}}(ensuremath{bm{zeta}}_l; ensuremath{sigma_alpha}, ensuremath{sigma_beta}, ensuremath{sigma_tau})]^{-1}(ensuremath{bm{x}}_l - ensuremath{bm{zeta}}_l)right}. end{split} label{eq:NLL-noise} Ignoring the constant first term, multiplying the remaining terms by 2, and expressing the matrix elements explicitly, we can define the ML cost function, begin{split} Q_text{ML}(sigma_alpha,sigma_beta,sigma_tau,ensuremath{bm{mu}},ensuremath{bm{A}},ensuremath{bm{eta}};ensuremath{bm{X}}) = sum_{k=0}^{N-1}sum_{l=0}^{M-1}& left{lnleft[ensuremath{sigma_alpha}^2 + ensuremath{sigma_beta}^2 Z_{kl}^2 + ensuremath{sigma_tau}^2(ensuremath{bm{D}}ensuremath{bm{Z}})_{kl}^2right]right. &quad + left.frac{(X_{kl} - Z_{kl})^2}{ensuremath{sigma_alpha}^2 + ensuremath{sigma_beta}^2 Z_{kl}^2 + ensuremath{sigma_tau}^2(ensuremath{bm{D}}ensuremath{bm{Z}})_{kl}^2}right}, end{split} label{eq:ML-cost} which we minimize to obtain ML estimates for all of the free parameters in the model. The resulting estimates for the noise parameters (ensuremath{sigma_alpha}, ensuremath{sigma_beta}, and ensuremath{sigma_tau}) are biased below their true values, which we can attribute to the presence of the nuisance parameters (ensuremath{bm{mu}}, ensuremath{bm{A}}, and ensuremath{bm{eta}}). For example, in 1000 Monte Carlo simulations of M=10 repeated measurements using the waveforms described in Sec. 2, the ratios of the ML estimates to their true values are ensuremath{hat{sigma}_alpha}/ensuremath{sigma_alpha}= 0.95 pm 0.02, ensuremath{hat{sigma}_beta}/ensuremath{sigma_beta}= 0.94 pm 0.03, and ensuremath{hat{sigma}_tau}/ensuremath{sigma_tau}= 0.89 pm 0.09, all significantly below unity. Increasing the number of observations to M=50 reduces the bias to ensuremath{hat{sigma}_alpha}/ensuremath{sigma_alpha}= 0.990 pm 0.007, ensuremath{hat{sigma}_beta}/ensuremath{sigma_beta}= 0.98 pm 0.01, and ensuremath{hat{sigma}_tau}/ensuremath{sigma_tau}= 0.93 pm 0.04, but some bias remains in ensuremath{hat{sigma}_beta} and ensuremath{hat{sigma}_tau} even in the limit Mrightarrowinfty. This residual bias arises because the number of elements in ensuremath{bm{A}} and ensuremath{bm{eta}} both grow with the number of observations, which allows us to account for drift but dilutes some of the information that the data provide about the noise. In principle, we can resolve this problem by integrating out all of the nuisance parameters in Eq. ([eq:likelihood-noise]) to obtain a marginal likelihood that depends only on ensuremath{sigma_alpha}, ensuremath{sigma_beta}, and ensuremath{sigma_tau}, but this involves computationally expensive integrations for a relatively small benefit. As we discuss below, we can remove much of the bias by simply rescaling the noise parameters by a suitable correction factor. To determine the bias correction factor it is helpful to consider a simplified example in which there is no drift and only additive noise, so that A_l = 1 and eta_l = 0 for all l and ensuremath{sigma_beta}= ensuremath{sigma_tau}= 0. The ML cost function then reduces to tilde{Q}_text{ML}(sigma_alpha,ensuremath{bm{mu}};ensuremath{bm{X}}) = sum_{k=0}^{N-1}sum_{l=0}^{M-1} left[lnensuremath{sigma_alpha}^2 + frac{(X_{kl} - mu_{k})^2}{ensuremath{sigma_alpha}^2}right], label{eq:ML-cost-simple} which is minimized by begin{aligned} ensuremath{hat{mu}}_k &= frac{1}{N}sum_{l = 0}^{M-1} X_{kl} = bar{x}_k, & ensuremath{hat{sigma}_alpha}^2 &= frac{1}{N}sum_{k = 0}^{N-1}s_M^2(t_k) = overline{s_M^2}, label{eq:ML-est-simple}end{aligned} where s_M^2(t_k) = frac{1}{M}sum_{l=0}^{M-1}(X_{kl} - bar{x}_k)^2. label{eq:samplevar} This result for ensuremath{hat{sigma}_alpha}^2 is biased because it is the waveform average of s_M^2(t_k), which in turn is just the biased sample variance of the measurements at t_k. To remove this bias we can apply Bessel’s correction, which replaces the factor of 1/M with 1/(M-1) in Eq. ([eq:samplevar]). Alternatively, we can multiply ensuremath{hat{sigma}_alpha}^2 by M/(M-1) in Eq. ([eq:ML-est-simple]). Returning to Eq. ([eq:ML-cost]) and following similar reasoning, we can justify rescaling all of the ML noise parameter estimates by the same factor, begin{aligned} ensuremath{hat{sigma}_alpha}^* &= ensuremath{hat{sigma}_alpha}sqrt{frac{M}{M-1}}, & ensuremath{hat{sigma}_beta}^* &= ensuremath{hat{sigma}_beta}sqrt{frac{M}{M-1}}, & ensuremath{hat{sigma}_tau}^* &= ensuremath{hat{sigma}_tau}sqrt{frac{M}{M-1}}. label{eq:unbiasedsigma}end{aligned} With these corrections, the Monte Carlo simulations with M=10 yield ensuremath{hat{sigma}_alpha}^*/ensuremath{sigma_alpha}= 1.00 pm 0.02, ensuremath{hat{sigma}_beta}^*/ensuremath{sigma_beta}= 0.99 pm 0.03, and ensuremath{hat{sigma}_tau}^*/ensuremath{sigma_tau}= 0.94 pm 0.09, which are all within the statistical uncertainty of the true values. The simulations with M=50 yield lower statistical uncertainty, but with the same average values: ensuremath{hat{sigma}_alpha}^*/ensuremath{sigma_alpha}= 1.000 pm 0.007, ensuremath{hat{sigma}_beta}^*/ensuremath{sigma_beta}= 0.99 pm 0.01, and ensuremath{hat{sigma}_tau}^*/ensuremath{sigma_tau}= 0.94 pm 0.04. For all practical purposes, the remaining bias in ensuremath{hat{sigma}_beta}^* and ensuremath{hat{sigma}_tau}^* is negligible. Experimental verification In this section we present experimental results that verify our analysis. Figure 5(a) shows the raw data that we use to specify the noise model, ensuremath{bm{X}}, which comprises M=50 waveforms, each with N=256 points sampled every T = 50~text{fs}. With this data as input, we minimize Eq. ([eq:ML-cost]) to obtain the ML estimates, ensuremath{hat{sigma}_alpha}, ensuremath{hat{sigma}_beta}, ensuremath{hat{sigma}_tau}, hat{ensuremath{bm{mu}}}, hat{ensuremath{bm{A}}}, and hat{ensuremath{bm{eta}}} for all of the free parameters in the model. The resulting time-dependent noise amplitude, corrected for bias using Eq. ([eq:unbiasedsigma]), is hat{sigma}^*_mu(t_k) = sqrt{V_{kk}(ensuremath{bm{hat{mu}}}; ensuremath{hat{sigma}_alpha}^*, ensuremath{hat{sigma}_beta}^*, ensuremath{hat{sigma}_tau}^*)}. label{eq:ML-noise-amplitude} In Fig. 5(b) we compare this model to the observed time-dependent noise, which we estimate by using hat{ensuremath{bm{A}}} and hat{ensuremath{bm{eta}}} to correct for the drift, tilde{ensuremath{bm{x}}}_l = frac{1}{hat{A}_l}ensuremath{bm{S}}(-hat{eta}_l)ensuremath{bm{x}}_l, label{eq:X-corrected} then compute the unbiased standard deviation at each time point over the set {tilde{ensuremath{bm{x}}}_l}. The model quantitatively describes most features of the measurements, with small deviations near some of the peaks. As a further consistency check, Fig. 5(c) shows the normalized residuals for one of the waveforms, ensuremath{bm{r}}_{text{ML},l} = left{ensuremath{bm{V}}left[ensuremath{bm{zeta}}(hat{ensuremath{bm{mu}}}, hat{A}_l, hat{eta}_l); ensuremath{hat{sigma}_alpha}^*, ensuremath{hat{sigma}_beta}^*, ensuremath{hat{sigma}_tau}^*right]right}^{-1/2}left[ensuremath{bm{x}}_{l} - ensuremath{bm{zeta}}(hat{ensuremath{bm{mu}}}, hat{A}_l, hat{eta}_l)right], which demonstrates that they approximately follow a standard normal distribution, with small but statistically significant correlations among neighboring points. Noise-model fits with laboratory measurements. (a) Set of 50 nominally identical waveform measurements that compose the data matrix ensuremath{bm{X}}. (b) Time-dependent noise amplitude obtained from a fit with Eq. ([eq:ML-cost]) to ensuremath{bm{X}}. The solid line shows the ideal noise model, with the bias-corrected ML estimates ensuremath{hat{sigma}_alpha}^* = (0.623pm 0.005)~text{pA}, ensuremath{hat{sigma}_beta}^* = (1.55pm 0.03)text{%}, and ensuremath{hat{sigma}_tau}^* = (1.8pm 0.1)~text{fs}. Points show the standard deviation of the measurements after correcting for drift, as described in the text. (c) Normalized residuals for one of the waveform measurements shown in (a). Figure 6 shows fits to two sequential waveforms from this set. A fit with Eq. ([eq:model1]) in the time domain, obtained by minimizing Q_text{TLS} in Eq. ([eq:TLS-cost-deviations]), yields ensuremath{hat{theta}}^text{TLS}_1 = 1.019pm 0.003 for the amplitude and ensuremath{hat{theta}}^text{TLS}_2 = (-2.8pm 0.5)~text{fs} for the delay, with the normalized residuals shown in Fig. 6(b). A fit with the same model in the frequency domain, obtained by minimizing Q_text{ETFE} in Eq. ([eq:ETFE-cost]), yields ensuremath{hat{theta}}^text{ETFE}_1 = 1.020pm 0.004 and ensuremath{hat{theta}}^text{ETFE}_2 = (-4.4pm 0.6)~text{fs}, with the normalized residuals shown in Fig. 6(d). Both fit methods reveal significant differences from the ideal values of theta_1 = 1 and theta_2 = 0, reflecting the measurement drift discussed in Sec. 5. As we found for the residuals of the noise-model fit in Fig. 5(c), the residuals of the transfer-function fit in Fig. 6(b) show small but statistically significant correlations. But as we also found with the idealized Monte Carlo simulations, the residuals of the ETFE fit in the frequency domain are much more structured than the residuals of the TLS fit to the same data in the time domain. Transfer-function fits with laboratory measurements. (a) Two sequential waveform measurements, taken from the set shown in Fig. 5. (b) Normalized time-domain residuals, bm{r}_text{TLS}(ensuremath{bm{hat{theta}}}^text{TLS}), for the TLS fit with Eq. ([eq:model1]). (c) Real and imaginary parts of hat{H}_text{ETFE}(omega_l) - 1 (dots), fitted with Eq. ([eq:model1]) (solid lines) and Eq. ([eq:model2]) (dotted lines) by minimizing Q_text{ETFE} in Eq. ([eq:ETFE-cost]). (d) Normalized frequency-domain residuals, r_text{ETFE}(omega_l; ensuremath{bm{hat{theta}}}^text{ETFE}), for the fit with Eq. ([eq:model1]). To illustrate the analysis problem that this raises, in Fig. 6(c) we compare an ETFE fit with Eq. ([eq:model1]) to an ETFE fit with a more flexible transfer-function model, H_2(omega; ensuremath{bm{theta}}) = (theta_1 + theta_3omega^2)exp[iomega(theta_2 + theta_4omega^2)]. label{eq:model2} Since the two input waveforms are nominally identical, we know that the downturns in Re[hat{H}_text{ETFE}(omega)] and Im[hat{H}_text{ETFE}(omega)] with increaing frequency near 2 THz are spurious. But if we did not know this in advance, we might conclude from Fig. 6(c) that Eq. ([eq:model2]) describes the measurements better than Eq. ([eq:model1]). We would also be able to support this conclusion by comparing the GOF statistic for the fit with Eq. ([eq:model1]), S^{(1)}_{text{ETFE}} approx 223, with nu = 254 degrees of freedom, to that obtained with Eq. ([eq:model2]), S^{(2)}_{text{ETFE}} approx 191, with nu = 252 degrees of freedom. By adding only two additional fit parameters, we reduce S_text{ETFE} by 33, which erroneously suggests that the added complexity of Eq. ([eq:model2]) captures a real physical effect. The TLS method is more robust against such overfitting: the GOF statistic with Eq. ([eq:model1]) is S^{(1)}_{text{TLS}} approx 198, while with Eq. ([eq:model2]), S^{(2)}_{text{TLS}} approx 196. In this case, adding two free parameters reduces S_text{TLS} by only two, so Occam’s razor favors the simpler model, Eq. ([eq:model1]). More formally, to select from a set of transfer-function models H_k(omega; ensuremath{bm{theta}}_k), k = 1, 2, ldots, N_text{model}, each with p_k free parameters, we can minimize a modified cost function based on the Akaike information criterion, Q_{AIC}(k) = S_text{TLS}^{(k)} + 2p_k, label{eq:AIC} where S_text{TLS}^{(k)} is the TLS GOF statistic for the model H_k (omega; ensuremath{bm{theta}}_k). As we discussed in Sec. 4, this ability to discriminate among competing models is one of the major advantages of the TLS method. If we divide the 50 experimental waveforms into 25 sequential pairs and fit each pair with Eq. ([eq:model1]), the results are qualitatively consistent with the Monte Carlo simulations and support the conclusion that S_text{TLS} offers the better absolute measure of fit quality. The distribution of S_text{ETFE} over the experiments has a mean bar{S}_text{ETFE} approx 235 and standard deviation sigma(S_text{ETFE}) approx 113, while the distribution for S_text{TLS} has a mean bar{S}_text{TLS} approx 246 and standard deviation sigma(S_text{TLS}) approx 39. For the simulated distributions shown in Fig. 3(a) and Fig. 4(a), we obtain bar{S}_text{ETFE} approx 235, sigma(S_text{ETFE}) approx 160, bar{S}_text{TLS} approx 250, and sigma(S_text{TLS}) approx 22. As we discussed at the end of Sec. 4, a narrower GOF distribution provides better sensitivity to fit quality. And while the experimental distribution for S_text{TLS} is nearly twice as broad it is in the simulations, it is still nearly a factor of 3 narrower than the experimental distribution for S_text{ETFE}. Despite the quantitative differences between the simulations and the experiment, the TLS method consistently shows better performance than the ETFE. Conclusion In summary, we have developed a maximum-likelihood approach to THz-TDS analysis and demonstrated that it has numerous advantages over existing methods. Starting from a few simple assumptions, we derived a method to determine the transfer function relationship between two THz-TDS measurements, together with a method to specify a parameterized noise model from a set of repeated measurements. In each case, we also derived expressions for fit residuals in the time domain that are properly normalized by the expected noise. Through a combination of Monte Carlo simulations and experimental measurements, we verified that these tools yield results that are accurate, precise, responsive to fit quality, and generally superior to the results of fits to the ETFE. We focused on simple examples to emphasize the logical structure of the method, but we can readily apply the same approach to more complex problems. For example, we have successfully used the framework presented here to measure a weak frequency dependence in the Drude scattering rate of a metal, which is predicted by Fermi liquid theory; we have also used it to measure small variations in the THz-frequency refractive index of silicon with temperature. In both of these applications we found that maximum-likelihood analysis in the time domain provided better performance than a least-squares analysis based on the ETFE in the frequency domain. We expect similar improvements in other applications, and provide MATLAB functions in Code Repository 1 (Ref. ) for others to try. JSD acknowledges support from NSERC and CIFAR, and DGS from an NSERC Alexander Graham Bell Canada Graduate Scholarship. JSD thanks John Bechhoefer for encouragement and stimulating discussions as this work developed. The authors declare no conflicts of interest. Why does the distribution of the goodness-of-fit statistic tend to be narrower for TLS fits compared to ETFE fits, both in simulations and experiments?
answer:Based on the document, there are a few reasons why the distribution of the goodness-of-fit statistic S tends to be narrower for TLS fits compared to ETFE fits, both in simulations and experiments: 1) The TLS method treats the input and output variables symmetrically and accounts for noise in both the input and output data (see Eq. 12). In contrast, the ETFE method focuses only on deviations in the output variable and ignores input noise (see discussion below Eq. 8). This asymmetry distorts the noise distribution in the ETFE method. 2) The TLS residuals mathbf{r}_text{TLS}(theta) are properly normalized by the expected noise (see Eq. 18), whereas the ETFE residuals r_text{ETFE}(theta) do not account for correlations and non-Gaussian aspects of the noise. This causes the ETFE residuals to exhibit more structure (Figs. 3, 6). 3) Due to the above factors, the TLS method offers better discrimination between good and bad models. Adding complexities to a model may reduce S_text{ETFE} even when not justified, whereas S_text{TLS} is more robust against overfitting (discussion of Fig. 6). 4) Both simulations (Fig. 3 vs Fig. 4) and experiments show a narrower distribution of S_text{TLS} compared to S_text{ETFE}. This supports the conclusion that S_text{TLS} provides better sensitivity to fit quality and is a more reliable measure of absolute goodness-of-fit.
question:Please answer the following question: Is there a negative or positive tone to this product review? === Title: A Great Read! Review: A smart, refreshing comedy! If you love 'House,' you'll love the humor in this book. Totally entertaining from cover to cover! Answer: Answer:
answer:To determine the tone of this product review, I will thoroughly analyze the given text by identifying key words or phrases that indicate the reviewer's feelings towards the book. 1. The title of the review is A Great Read! which is a positive statement, suggesting a positive tone. 2. In the review, it says, A smart, refreshing comedy! which is another positive statement, indicating that the reviewer thinks the book is witty and enjoyable. 3. The reviewer compares the book to the TV show 'House,' suggesting that the humor in the book would be appreciated by fans of the show. This is a positive association, as the show has a fan base and the comparison implies that the book is similar in a good way. 4. The review concludes with, Totally entertaining from cover to cover! which is another positive statement, emphasizing that the reviewer was entertained throughout their reading experience. Considering all these points, I can conclude that there is a positive tone to this product review.
question:I'm thrilled to share a story that highlights the power of patient education and dedicated healthcare. A few months ago, I had the privilege of working with Sarah Johnson, a patient diagnosed with a rare form of macular degeneration. Despite the initial challenges, Sarah's condition improved significantly after a customized treatment plan and consistent patient education. Her journey is a testament to the importance of clear communication and a supportive healthcare team. 🌟 Inspired by Sarah's success, I am excited to announce the launch of Eyes on Health, a new educational initiative in collaboration with the San Francisco Eye Institute. This program aims to provide comprehensive resources for patients and caregivers, helping them better understand and manage eye conditions. Whether it's through webinars, articles, or interactive Q&A sessions, we are committed to making eye health accessible and understandable for everyone. I would love to hear your thoughts and experiences. How do you ensure your patients stay informed and engaged in their healthcare journey? #EyeHealth #PatientEducation #Ophthalmology
answer:I am pleased to share an inspiring story that underscores the significance of patient education and committed healthcare. Several months ago, I had the honor of collaborating with Sarah Johnson, a patient diagnosed with a rare form of macular degeneration. Despite the initial obstacles, Sarah's condition markedly improved following the implementation of a tailored treatment plan and ongoing patient education. Her journey serves as a powerful testament to the critical role of clear communication and a supportive healthcare team. Motivated by Sarah's success, I am delighted to announce the launch of Eyes on Health, a new educational initiative in partnership with the San Francisco Eye Institute. This program is designed to offer comprehensive resources for patients and caregivers, enhancing their understanding and management of eye conditions. Through a variety of formats including webinars, articles, and interactive Q&A sessions, we are dedicated to making eye health accessible and comprehensible to all. I welcome your insights and experiences. How do you ensure that your patients remain informed and engaged in their healthcare journey? #EyeHealth #PatientEducation #Ophthalmology
question:What is the Great Attractor discovered in 1978 and does it impact the motion of the Milky Way Galaxy?
answer:The Great Attractor is a region of space that is pulling our galaxy, the Milky Way, and many others towards it. Discovered in 1978 by a team of astronomers led by Brent Tully and Richard Fisher, it is located about 250 million light-years away from us. The Great Attractor is a large, diffuse region of space that is spread out over a vast area of the sky and is exerting a gravitational pull on a large number of galaxies, including our own. The Great Attractor is thought to be a large, unseen mass concentration that is not visible to us, but its presence can be inferred by the way it affects the motion of galaxies in the surrounding region. The attractor is believed to be a large galaxy cluster or a supercluster that is embedded in a vast network of galaxy filaments. The Great Attractor does impact the motion of the Milky Way galaxy, as well as many other galaxies in the region. Our galaxy is being pulled towards the Great Attractor at a speed of about 22 million kilometers per hour (14 million miles per hour). However, the effect of the Great Attractor on the motion of the Milky Way is relatively small compared to other forces that shape the galaxy's motion, such as the gravitational pull of nearby galaxies and the expansion of the universe itself. It's worth noting that in the years since the discovery of the Great Attractor, astronomers have discovered that it is part of a much larger structure, known as Laniakea Supercluster, which stretches over 500 million light-years across space. This supercluster is thought to be one of the largest known structures in the universe.