Conditional versus Marginal Covariance Representation for Linear and Nonlinear Models

Grouped data, such as repeated measures and longitudinal data, are increasingly collected in different areas of application, as varied as clinical trials, epidemiological studies, and educational testing. It is often of interest, for these data, to explore possible relationships between one or more response variables and available covariates. Because of the within-group correlation typically present with this type of data, special regression models that allow the joint estimation of mean and covariance parameters need to be used. Two main approaches have been proposed to represent the covariance structure of the data with these models: (i) via the use of random effects, the so-called conditional model and (ii) through direct representation of the covariance structure of the responses, known as the marginal approach. Here we discuss and compare these two approaches in the context of linear and non-linear regression models with additive Gaussian errors, using a real data example to motivate and illustrate the discussion. Zusammenfassung: Gruppierte Daten, wie wiederholte Messungen und Longitudinaldaten, werden häufig in den verschiedensten Anwendungsgebieten, etwa bei klinischen und epidemiologischen Studien und in den Erziehungswissenschaften erhoben. Mögliche Zusammenhänge zwischen einer oder mehreren Responsevariablen und vorhandenen Kovariaten sind oft das Ziel der Untersuchung. Um die Korrelation innerhalb der Gruppe zu berücksichtigen, sind Regressionsmodelle nötig, die eine gemeinsame Schätzung von Mittelwert und Kovarianzparametern erlauben. Zwei Ansätze wurden vorgeschlagen, um die Kovarianzstruktur der Daten zu berücksichtigen: (i) durch die Verwendung von zufälligen Effekten, das sogenannte konditionale Modell und (ii) durch die direkte Repräsentation der Kovarianzstruktur der Responses, bekannt als das marginale Modell. Hier vergleichen und diskutieren wir beide Ansätze im Kontext von linearen und nicht-linearen Regressionsmodellen mit additiven Gaußschen Fehlern anhand eines realen Beispiels.


Introduction
Mixed-effects models are useful in describing relationships between a response variable and covariates in data that are grouped according to one or more classification factors.Examples of such grouped data include longitudinal, repeated measures, and multilevel data, which frequently arise in many areas of application, such as clinical trials, epidemiological studies, and educational testing.Mixed-effects models assume that the form of the intra-group model relating the response variable to the covariates is common to all groups, but some of the parameters that define the model are allowed to vary with group through the incorporation of random effects.By associating common random effects to observations in the same group, mixed-effects models flexibly represent the covariance structure induced by the grouping of the data.
Alternatively, the covariance matrix of the responses may be directly modelled via the covariance model adopted for the within-group error term (assumed to enter the model linearly).Of course, both approaches can be used in combination, giving mixed-effects models considerable flexibility in representing the covariance structure present in grouped data.
In practice, one is usually interested in jointly estimating the regression model parameters (representing the relationship between the response and covariates of interest) and the covariance parameters, determining the correlation and variance structure of the response.The approach based on the use of random effects is called the conditional model, while the one relying on direct modelling of the within-group error covariance structure is known as the marginal model.
This paper describes and contrasts the two approaches in the context of linear mixedeffects (LME) models and nonlinear mixed-effects (NLME) models.A motivating example, the Dialyzer data, which will be used for illustration of both the LME and NLME models, is described in Section 2. In Section 3 it is shown that the two approaches are equivalent in the context of LME models, in the sense that both can be used to directly model the covariance matrix of the responses.Section 4 investigates extensions to the NLME model, in which the effect of the random effects model on the covariance structure of the response can only be determined approximately.Our conclusions are included in Section 5.
2 An Example: High-Flux Hemodialyzer Vonesh and Carter (1992) describe and analyze data measured on high-flux hemodialyzers to assess their in vivo ultrafiltration characteristics.The ultrafiltration rates (in ml/hr) of 20 high-flux dialyzers were measured at 7 ascending transmembrane pressures (in dmHg).The in vitro evaluation of the dialyzers used bovine blood at flow rates of either 200 dl/min or 300 dl/min.These data are available in the NLME library (Pinheiro and Bates, 2000) in S-PLUS and R.
The plots of the ultrafiltration rates versus transmembrane pressure by bovine blood flow rate, displayed in Figure 1, reveal that, as expected, the ultrafiltration rate increases with transmembrane pressure up to a maximum, and that higher ultrafiltration rates are attained with the 300 dl/min blood flow dialyzers.These plots show a clear correlation in the measurements made on the same dialyzer and also indicate that the variability in the ultrafiltration rates increases with transmembrane pressure.
In their original paper, Vonesh and Carter (1992) use a nonlinear model to represent the relationship between ultrafiltration rate and transmembrane pressure.An alternative analysis, based on a linear polynomial model, is presented in Littell et al. (1996).Both analyses use random effects to account for the within-dialyzer correlation, with the latter also investigating the use of marginal covariance models based on extended linear regression models.We use this example illustrate and compare the conditional and marginal model approaches for both the LME model and the NLME model.

Linear Mixed-Effects Model
The linear mixed-effects model for a normally distributed response grouped according to a single factor with M levels, proposed by Laird and Ware (1982), is expressed as where i is the group index, y i is an n i -dimensional vector of observed responses, X i and Z i are known n i × p and n i × q regression matrices corresponding to the p-dimensional fixed effects vector β and the q-dimensional random effects vector b i respectively, and i is an n i -dimensional vector of within-group errors.The b i are assumed to be independently distributed as N (0, Ψ) and the i are assumed to be independently distributed as N (0, Λ i ), independent of the b i .The Ψ covariance matrix may be unstructured or structured -e.g.diagonal (Jennrich and Schluchter, 1986), being parameterized by a set of parameters θ.The Λ i matrices are typically assumed to depend on i only through their dimensions, being parameterized by a fixed, generally small, set of parameters λ -e.g. an AR(1) structure (Box et al., 1994).
Several methods of parameter estimation have been proposed for LME models and we consider here the two most widely used and available in statistical software: maximum likelihood (ML) and restricted maximum likelihood (REML).Descriptions and comparisons of the various estimation methods used for LME models can be found, for example, Austrian Journal of Statistics, Vol. 35 (2006), No. 1, 31-44 in Searle et al. (1992) and Vonesh and Chinchilli (1997).For a Bayesian perspective see Wakefield et al. (1994).
Even though the random effects are useful and intuitive quantities to represent betweengroup differences in the coefficients, they are not observable in practice.Therefore, likelihood estimation and inference generally rely on the marginal distribution of the observed response vectors y i .Because of the linearity of the random effects in the LME model (1), the assumptions on the random effects and the within-group errors, and the properties of the multivariate normal distribution, it can be shown that the y i are marginally distributed as independent N (X i β, Σ i ) random vectors, where the marginal covariance matrix is given by: var There are two ways in which the LME model (1) can account for within-group correlation and heteroscedasticity (non-constant variance): through the random effects b i and through the within-group errors i .Because the random effects b i are fixed by group, not varying with observation, the within-group observations share the same random effects and are, therefore, correlated.This is represented by the Z i ΨZ i component of Σ i .Note, also, that the diagonal elements of Z i ΨZ i need not be constant, so that it can also accommodate heteroscedasticity.This component of the marginal covariance matrix Σ i is the one favored in the conditional model approach, with Λ i assuming a simple form (typically Λ i = σ 2 I i , with I i denoting the identity matrix of order n i .) The within-group error contribution to the marginal covariance matrix is given directly by Λ i , which can be non-diagonal (correlation) and have different diagonal elements (heteroscedasticity).In the marginal model approach one sets Σ i = Λ i , so that the entire covariance structure is determined by the within-group error.
Of course, in practice, one may use both components in (2) when modelling Σ i .For example, one may use the random effects component to account mostly for the correlation and the within-group component to account for heteroscedasticity via the use of a variance function (Pinheiro and Bates, 2000).Some care should be exercised when using both components in a model, as they may very well compete with each other in explaining the marginal covariance and lead to nearly or fully overparameterized models.
As a simple example, consider the case of an LME model with a single random intercept, that is, with random effects model given by 1 i b i , where b i is a scalar.The corresponding random effect component of Σ i is then equal to ψ1 i 1 i = ψJ i , with J i representing an n i × n i matrix of ones.If we assume a compound symmetry structure for the within-group covariance, that is, , the resulting marginal covariance would have diagonal terms equal to σ 2 + ψ and off-diagonal terms σ 2 ρ + ψ, that is, an overparameterized compound symmetry structure.Another example of overparameterization would result from the use of an unstructured (general) covariance matrix Λ i together with any random effects model.
Conditional on the parameters that determine Σ i , the (RE)ML estimate of the fixed effects β and its covariance matrix are given by The (RE)ML estimates of the parameters determining Σ i can not be expressed in closed form, except in trivial cases, and numerical optimization of the (restricted) likelihood function must be employed.The MLE of β is then obtained by replacing Σ i in (3) with their corresponding estimates.Note that both β and its estimated variance depend on the covariance model through the marginal covariance matrices Σ i only.Therefore, methods that lead to similar estimates of the Σ i will also lead to similar inferences on the fixed effects, including confidence intervals and tests of hypothesis.Therefore, if the main questions of interest associated with the LME model are related to inferences about the fixed effects (like would be the case in clinical trials, for example), then the conditional and marginal approaches may lead to equivalent conclusions, provided similar estimates of Σ i can be obtained with both methods.However, if one is interested in issues related to the covariance model, like inter-subject variability or spatial correlation, it may be that only one of the approaches can be used to address the specific question of interest.This will, of course, be application dependent, so no general recommendations can be made about the greater adequacy of either approach.

Dialyzer Example: Linear Model Version
As an empirical model to describe the relationship between the ultrafiltration rates and the transmembrane pressure in the Dialyzer example of Section 2, Littell et al. (1996) proposed the use of a fourth-order linear mixed-effects polynomial allowing for differences in the fixed effects between the dialyzers with in vitro ultrafiltration rates of 200 dl/min and 300 dl/min.
After several model building steps to determine which parameters should vary with in vitro ultrafiltration class and which should be assigned random effects to accommodate the inter-subject variation, the following LME model was chosen to represent the ultrafiltration rate y ij at the jth transmembrane pressure x ij for the ith subject.
where Q i is a binary variable taking values 0 for 200 dl/min hemodialyzers and 1 for 300 dl/min hemodialyzers; β 0 , β 1 , β 2 , β 3 , and β 4 are, respectively, the intercept, linear, quadratic, cubic, and quartic fixed effects corresponding to 200 dl/min dialyzers; γ k is the blood flow effect associated with the fixed effect β k , k = 0, 1; b i is the vector of random effects, assumed independent for different i; and ij is the within-group error, assumed independent for different i, j and independent of the random effects.Different structures can be used to represent the random effects covariance matrix Ψ, as illustrated further below.The variance of the ij is allowed to change with the transmembrane pressure according to a power model with parameter δ (estimated together with the other parameters in the model.)This was needed due to heteroscedastic behavior observed in the residuals from the LME fit with homocedastic within-subject errors.
The LME model in (4) uses random effects to account for the correlation among within-subject measurements and some of the heteroscedasticity in the response as well.An additional variance function model is used for the within-subject error to properly accommodate the observed heteroscedasticity, but the approach can be considered primarily a conditional model, as discussed previously.
An alternative marginal model to represent the data using the empirical linear polynomial model is given below.
that is, the same fixed effects and variance function as in (4) are used, but the withinsubject correlation is now modeled by an AR(1) structure in the within-subject errors.
We also considered marginal models similar to ( 5), but with within-subject correlation structure given by a Gaussian spatial correlation model with a nugget effect (Pinheiro and Bates, 2000), an ARMA(1,3) model, and an unstructured model (that is, with 21 different correlations allowed for the within-subject measurement pairs).
The conditional and marginal models described above for the Dialyzer data were fitted using, respectively, the functions lme and gls in the NLME library in S-PLUS, with REML estimation in all cases.Figure 2 shows the scatter plots of the conditional model estimates for variance and covariance parameters in Σ i versus the corresponding estimates using the different marginal models.The fixed effects estimates and respective standard errors obtained in the LME fit are similar to the estimates obtained in the marginal model fits.The conclusions about the significance of the model parameters are the same in all five models.The greater discrepancies are observed for the unstructured correlation model with respect to the other models, but those differences are not pronounced and within the estimated precisions (overlapping confidence intervals).The main conclusion from the exercise above is that the conditional and marginal approaches lead to similar inferences about the model parameters and similar estimated covariance matrices Σ i .

Nonlinear Mixed-Effects Model
Nonlinear mixed-effects (NLME) models are mixed-effects models in which the response function is nonlinear in at least some of the underlying parameters.Several different nonlinear mixed effects models have been proposed in the literature (Sheiner and Beal, 1980;Mallet et al., 1988;Lindstrom and Bates, 1990;Vonesh and Carter, 1992;Davidian and Gallant, 1992;Wakefield et al., 1994).We adopt here the NLME model proposed by Lindstrom and Bates (1990), which can be viewed as a hierarchical model that generalizes both the linear mixed-effects model of Section 3 and the usual nonlinear regression model for independent data (Bates and Watts, 1988).In the first stage, the jth observation on the ith group is described as where f is a nonlinear function of a group-specific vector of parameters φ ij and the vector of covariates x ij , M is the total number of groups, and n i is the number of observations in the ith group.Like in the LME model ( 1), the within-group error vectors i = ( i1 , . . ., in i ) are assumed to be independently distributed as N (0, Λ i ), with the Λ i parameterized by a fixed parameter vector λ.In the second stage the group-specific parameters are modelled as Like in the LME model, β represents the fixed effects and b i the random effects (varying with i but not with j), which are assumed to be independently distributed as N (0, Ψ).
A ij and B ij are design matrices for the fixed and random effects respectively, which may depend on the values of some covariates at the jth observation.It is further assumed that the b i are independent of the ij .
Different methods have been proposed to estimate the parameters in the NLME model (Ramos and Pantula, 1995;Davidian and Giltinan, 1995;Vonesh and Chinchilli, 1997); we concentrate here on methods based on the likelihood function.As in the LME case, because the random effects b i are unobserved quantities, maximum likelihood estimation in NLME models is based on the marginal density of the responses y, which is calculated as Because the model function f in ( 6) can be nonlinear in the random effects, the integral in (8) generally does not have a closed-form expression.To make the numerical optimization of the likelihood function a tractable problem, different approximations to ( 8) have been proposed.Some of these methods consist of taking a first-order Taylor expansion of the model function f around the expected value of the random effects (i.e., 0) (Sheiner and Beal, 1980;Vonesh and Carter, 1992), or around the conditional modes of the random effects ( b i ) (Lindstrom and Bates, 1990).We adopt here the approximation suggested by Lindstrom and Bates (1990), which is implemented via an alternating algorithm comprising a linear mixed-effects (LME) step and a penalized nonlinear least squares (PNLS) step.This is the algorithm used in the nlme function of the NLME library.Inferences on the model parameters, including hypothesis testing, are based on asymptotic results for the linear mixed-effects log-likelihood used in the LME step of the alternating algorithm (Pinheiro and Bates, 2000).
Like for the LME model, the marginal covariance matrix of the responses in the NLME model can be expressed as the sum of two components, one associated with the random effects and another with within-group errors.Letting Note that var (f (β, b i , X i )) depends not only on Ψ, but also on β and X i .One can obtain an estimate for this component of the variance by simulating random effects vec- and deriving the associated sample covariance matrix.Alternatively, a firstorder Taylor expansion of f (β, •, X i ) around b i = 0 can be used to obtain an LME-like approximation .
The same comments and recommendations discussed in Section 3 for the LME model also apply here.The conditional approach attempts to explain the covariance structure of the response mostly via the random effects component of the marginal covariance, while the marginal approach focuses on the within-group component.Care should be exercised in not making both components too complex, to avoid overparameterization problems.
Unlike the LME case, however, the marginal covariance in the NLME model also depends on the fixed effects, which makes the problem more complex.In addition, because of the need to approximate the likelihood function of the NLME model to make the estimation problem feasible, the relationship between the marginal covariance matrix in (9) and estimation results for the fixed effects is not as clear cut as in the LME model case.Another important difference worth pointing out between the conditional and marginal models in the nonlinear case is in the interpretation of the parameters β.In the conditional model, the fixed effects are associated with the response of typical individual (i.e., with b i = 0), while in the marginal model, these parameters are associated with the average response across the individuals in the population.

Dialyzer Example: Nonlinear Model Version
An empirical fourth-order linear polynomial model proposed by Littell et al. (1996) was used in Section 3.1 to represent the Dialyzer data.The model originally proposed for these data by Vonesh and Carter (1992) is an asymptotic regression model with an offset, which expresses the expected ultrafiltration rate y at transmembrane pressure x as Unlike the parameters in the empirical linear model of Section 3.1, the parameters in model ( 10) have a physiological interpretation: φ 1 is the maximum ultrafiltration rate that can be attained, φ 2 is the logarithm of the hydraulic permeability transport rate, and φ 3 is the transmembrane pressure required to offset the oncotic pressure.Vonesh and Carter (1992) suggest using different parameters in (10) for each blood flow rate level.After several model building steps, the following NLME version of (10) was selected to represent the ultrafiltration rate y ij at the jth transmembrane pressure x ij for the ith subject where, as in Section 3.1, Q i is a binary indicator taking values 0 for 200 dl/min hemodialyzers and 1 for 300 dl/min hemodialyzers; β 1 , β 2 , and β 3 are, respectively, the asymptotic ultrafiltration rate, the log-transport rate, and the transmembrane pressure offset fixed effects corresponding to 200 dl/min dialyzers; γ k is the blood flow effect associated with the fixed effect β k , k = 1, 2; b i is the vector of random effects, assumed independent for different i; and ij is the within-group error, assumed independent for different i, j and independent of the random effects.The variance of the ij is allowed to change with the transmembrane pressure according to a power model with parameter δ (estimated together with the other parameters in the model.)Random effects are used in the NLME model ( 11) to account for within-subject correlation, as well as some of the heteroscedasticity in the response.A variance function model is used in the within-subject covariance matrix Λ i to accommodate the remaining heteroscedasticity in the data.Overall, this approach falls within the conditional model category.
Alternatively, a marginal nonlinear regression model, without random effects, can be used to represent the data.
This model has the same fixed effects and variance function as ( 11), but the correlation among measurements is now modelled by an AR(1) structure in the within-subject errors.
As in the linear model analysis of Section 3.1, three additional marginal nonlinear models similar to (5) were considered, with within-subject correlation structures: Gaussian spatial correlation with a nugget effect, ARMA(1,3), and unstructured.The conditional and marginal models described above were fitted using, respectively, the functions nlme and gnls in the NLME library in S-PLUS, with approximate ML estimation for the NLME model and exact ML estimation for the marginal models.The simulation approach outlined previously (based on sampling random effects from their estimated distribution) was used to estimate the marginal covariance matrices in the NLME model.Note that, because of the dependency of Σ i on β, different matrices are obtained for 200 dl/min and 300 dl/min dialyzers.For simplicity, we consider here the 200 dl/min dialyzer estimates.Even though there are noticeable differences in the estimates of the marginal covariance matrices, the fixed effects estimates obtained via the marginal and conditional approaches are fairly similar and lead to the same inferences about the differences between the two types of dialyzers.The greater discrepancies are observed between the unstructured marginal model and the NLME conditional model, but even these are not of considerable magnitude.

Conclusions
This paper describes and contrasts the conditional and marginal modelling approaches in the context of linear and nonlinear regression models.In the linear case, the marginal covariance matrix corresponding to the responses in a linear mixed-effects model is expressed as a sum of two components: one determined solely by the random effects model (associated with the conditional approach) and another determined by the within-group error covariance (associated with the marginal approach.)The two approaches are equivalent in the sense that they directly model the marginal covariance of the responses and that both can use the same estimation method (maximum likelihood or restricted maximum likelihood.) In the nonlinear case, however, the association between random effects model and the marginal covariance matrix is less direct, depending also on the fixed effects model.In addition, because the random effects typically occur nonlinearly in an NLME model, for computational reasons exact likelihood methods are seldom used.Likelihood methods, on the other hand, can be easily implemented with the marginal approach.As a consequence, the two approaches can not be considered as equivalent, even though similar inferences on the fixed effects are usually obtained.
Perhaps the most important differences between the two approaches have to do with the questions that they are intended to answer.If the primary questions of interest refer to the fixed effects only, then, depending on the choice of random effects and within-group covariance, both methods could be equivalent in the linear case, and generally nearly equivalent in the nonlinear case.However, if one is interested in using predicted random effects to explore the impact of covariates on inter-subject variation, like is often done in pharmacokinetics/pharmacodynamics modeling (Davidian and Giltinan, 1995), then the conditional approach would be the natural choice.

Figure 1 :
Figure 1: Hemodialyzer ultrafiltration rates (in ml/hr) measured at 7 different transmembrane pressures (in dmHg) on 20 high-flux dialyzers.In vitro evaluation of dialyzers based on bovine blood flow rates of 200 dl/min and 300 dl/min.

Figure 2 :
Figure 2: Conditional versus marginal estimates for variance and covariance parameters in the Dialyzer linear models.Dotted line represents y = x reference line.

Figure 3 :
Figure 3: Conditional versus marginal estimates for variance and covariance parameters in the Dialyzer nonlinear models.Conditional model estimates correspond to 200 dl/min dialyzers.Dotted line represents y = x reference line.