GLS, MMRM, and Missing Data

GLS
MMRM
mixed-effects
lme
nlme
R
Published

March 19, 2026

The issue

The gls routine in the nlme package is sensitive to the data order when using corCompSymm, particularly when data is missing. This is alluded to in the corCompSymm documentation. Below are some examples.

No missingness

When there is no missingness, there is no problem provided the data is sorted by time:

library(nlme)
set.seed(20130506)
fm1Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont,
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm1Orth.gls)$tTable
                           Value  Std.Error   t-value      p-value
(Intercept)           24.9371232 0.47286660 52.736063 7.204876e-77
SexFemale             -2.2717427 0.74083959 -3.066443 2.760865e-03
I(age - 11)            0.8268037 0.08221771 10.056272 5.037413e-17
SexFemale:I(age - 11) -0.3504390 0.12881040 -2.720580 7.640131e-03
fm2Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[nrow(Orthodont):1, ], # reverse order
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm2Orth.gls)$tTable # same as sorted
                           Value  Std.Error   t-value      p-value
(Intercept)           24.9371337 0.47286148 52.736658 7.196740e-77
SexFemale             -2.2717535 0.74083155 -3.066491 2.760460e-03
I(age - 11)            0.8268026 0.08221817 10.056203 5.039201e-17
SexFemale:I(age - 11) -0.3504379 0.12881111 -2.720557 7.640630e-03

But if the data is not sorted by time, we get inconsistent results. In fact, here the estimation does not converge:

fm3Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[sample(1:nrow(Orthodont), replace = TRUE), ],
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )

Missingness

When there is missingness, sorting does not help:

# Generate some missingness
Orthodont$missing <- sample(c(TRUE, FALSE), 
  size = nrow(Orthodont), prob = c(0.2, 0.8), 
  replace = TRUE)

fm4Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont, subset = !missing,
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm4Orth.gls)$tTable
                           Value  Std.Error   t-value      p-value
(Intercept)           24.6507074 0.49253896 50.048239 1.936427e-66
SexFemale             -1.8766129 0.76806327 -2.443305 1.655051e-02
I(age - 11)            0.6845546 0.07834144  8.738091 1.420426e-13
SexFemale:I(age - 11) -0.1694028 0.11714233 -1.446128 1.516937e-01
fm5Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[nrow(Orthodont):1, ], # reverse order
  subset = !missing,
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm5Orth.gls)$tTable
                           Value Std.Error   t-value      p-value
(Intercept)           24.7529072 0.4946600 50.040246 1.962908e-66
SexFemale             -2.0630955 0.7715183 -2.674072 8.932466e-03
I(age - 11)            0.7260246 0.0785608  9.241563 1.305493e-14
SexFemale:I(age - 11) -0.2256612 0.1175943 -1.918981 5.822792e-02

The fix

By default, corSymm(form = ~ 1 | Subject) uses the order of observations to estimate the correlation components. In this case, the (i,j) component of the correlation matrix represents the correlation between the ith and jth observation. If instead we want the correlation between the ith and jth time point to be represented, we need to use corSymm(form = ~ t | Subject), where t are the consecutive integers indexing time.

Orthodont$t <- as.numeric(factor(Orthodont$age, levels = c(8, 10, 12, 14)))
with(Orthodont, table(t, age))
   age
t    8 10 12 14
  1 27  0  0  0
  2  0 27  0  0
  3  0  0 27  0
  4  0  0  0 27
fm6Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont, subset = !missing,
  correlation = corSymm(form = ~ t | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm6Orth.gls)$tTable
                           Value  Std.Error   t-value      p-value
(Intercept)           24.7022356 0.49498531 49.904987 2.470935e-66
SexFemale             -2.0520703 0.77463325 -2.649086 9.566568e-03
I(age - 11)            0.6883899 0.08674632  7.935667 6.270004e-12
SexFemale:I(age - 11) -0.1690610 0.12987781 -1.301693 1.964188e-01
fm7Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[nrow(Orthodont):1, ], 
  subset = !missing,
  correlation = corSymm(form = ~ t | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm7Orth.gls)$tTable
                           Value  Std.Error   t-value      p-value
(Intercept)           24.7022356 0.49498315 49.905205 2.470016e-66
SexFemale             -2.0520687 0.77462980 -2.649096 9.566314e-03
I(age - 11)            0.6883909 0.08674715  7.935602 6.271927e-12
SexFemale:I(age - 11) -0.1690623 0.12987912 -1.301690 1.964198e-01

Note this is not an issue with compound symmetry structure, corCompSymm, because the correlation parameters are exchangeable with respect to time.