GLS, MMRM, and Missing Data

The issue

The gls routine in the nlme package is sensitive to the data order when using corCompSymm, particularly when data is missing. This is alluded to in the corCompSymm documentation. Below are some examples.

No missingness

When there is no missingness, there is no problem provided the data is sorted by time:

library(nlme)
set.seed(20130506)
fm1Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont,
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm1Orth.gls)$tTable
                      Value Std.Error t-value p-value
(Intercept)           24.94     0.473    52.7 7.2e-77
SexFemale             -2.27     0.741    -3.1 2.8e-03
I(age - 11)            0.83     0.082    10.1 5.0e-17
SexFemale:I(age - 11) -0.35     0.129    -2.7 7.6e-03
fm2Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[nrow(Orthodont):1, ], # reverse order
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm2Orth.gls)$tTable # same as sorted
                      Value Std.Error t-value p-value
(Intercept)           24.94     0.473    52.7 7.2e-77
SexFemale             -2.27     0.741    -3.1 2.8e-03
I(age - 11)            0.83     0.082    10.1 5.0e-17
SexFemale:I(age - 11) -0.35     0.129    -2.7 7.6e-03

But if the data is not sorted by time, we get inconsistent results. In fact, here the estimation does not converge:

fm3Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[sample(1:nrow(Orthodont), replace = TRUE), ],
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
Error in gls(distance ~ Sex * I(age - 11), Orthodont[sample(1:nrow(Orthodont), : function evaluation limit reached without convergence (9)

Missingness

When there is missingness, sorting does not help:

# Generate some missingness
Orthodont$missing <- sample(c(TRUE, FALSE), 
  size = nrow(Orthodont), prob = c(0.2, 0.8), 
  replace = TRUE)

fm4Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont, subset = !missing,
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm4Orth.gls)$tTable
                      Value Std.Error t-value p-value
(Intercept)           25.05     0.506    49.5 1.4e-66
SexFemale             -2.43     0.792    -3.1 2.9e-03
I(age - 11)            0.83     0.079    10.6 2.4e-17
SexFemale:I(age - 11) -0.33     0.122    -2.7 8.2e-03
fm5Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[nrow(Orthodont):1, ], # reverse order
  subset = !missing,
  correlation = corSymm(form = ~ 1 | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm5Orth.gls)$tTable
                      Value Std.Error t-value p-value
(Intercept)           25.00     0.500    50.0 6.0e-67
SexFemale             -2.36     0.785    -3.0 3.5e-03
I(age - 11)            0.85     0.088     9.7 1.5e-15
SexFemale:I(age - 11) -0.35     0.131    -2.7 8.8e-03

The fix

By default, corSymm(form = ~ 1 | Subject) uses the order of observations to estimate the correlation components. In this case, the (i,j) component of the correlation matrix represents the correlation between the ith and jth observation. If instead we want the correlation between the ith and jth time point to be represented, we need to use corSymm(form = ~ t | Subject), where t are the consecutive integers indexing time.

Orthodont$t <- as.numeric(factor(Orthodont$age, levels = c(8, 10, 12, 14)))
with(Orthodont, table(t, age))
   age
t    8 10 12 14
  1 27  0  0  0
  2  0 27  0  0
  3  0  0 27  0
  4  0  0  0 27
fm6Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont, subset = !missing,
  correlation = corSymm(form = ~ t | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm6Orth.gls)$tTable
                      Value Std.Error t-value p-value
(Intercept)           25.12     0.505    49.7 1.1e-66
SexFemale             -2.54     0.789    -3.2 1.8e-03
I(age - 11)            0.82     0.094     8.7 1.8e-13
SexFemale:I(age - 11) -0.33     0.148    -2.2 3.1e-02
fm7Orth.gls <- gls( distance ~ Sex * I(age - 11), 
  Orthodont[nrow(Orthodont):1, ], 
  subset = !missing,
  correlation = corSymm(form = ~ t | Subject),
  weights = varIdent(form = ~ 1 | age) )
summary(fm7Orth.gls)$tTable
                      Value Std.Error t-value p-value
(Intercept)           25.12     0.505    49.7 1.1e-66
SexFemale             -2.54     0.789    -3.2 1.8e-03
I(age - 11)            0.82     0.094     8.7 1.8e-13
SexFemale:I(age - 11) -0.33     0.148    -2.2 3.1e-02

Note this is not an issue with compound symmetry structure, corCompSymm, because the correlation parameters are exchangeable with respect to time.

comments powered by Disqus