GLS, MMRM, and Missing Data
The issue
The gls routine in the nlme package is sensitive to the data order when using corCompSymm, particularly when data is missing. This is alluded to in the corCompSymm documentation. Below are some examples.
No missingness
When there is no missingness, there is no problem provided the data is sorted by time:
library(nlme)
set.seed(20130506)
fm1Orth.gls <- gls( distance ~ Sex * I(age - 11),
Orthodont,
correlation = corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age) )
summary(fm1Orth.gls)$tTable
Value Std.Error t-value p-value
(Intercept) 24.94 0.473 52.7 7.2e-77
SexFemale -2.27 0.741 -3.1 2.8e-03
I(age - 11) 0.83 0.082 10.1 5.0e-17
SexFemale:I(age - 11) -0.35 0.129 -2.7 7.6e-03
fm2Orth.gls <- gls( distance ~ Sex * I(age - 11),
Orthodont[nrow(Orthodont):1, ], # reverse order
correlation = corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age) )
summary(fm2Orth.gls)$tTable # same as sorted
Value Std.Error t-value p-value
(Intercept) 24.94 0.473 52.7 7.2e-77
SexFemale -2.27 0.741 -3.1 2.8e-03
I(age - 11) 0.83 0.082 10.1 5.0e-17
SexFemale:I(age - 11) -0.35 0.129 -2.7 7.6e-03
But if the data is not sorted by time, we get inconsistent results. In fact, here the estimation does not converge:
fm3Orth.gls <- gls( distance ~ Sex * I(age - 11),
Orthodont[sample(1:nrow(Orthodont), replace = TRUE), ],
correlation = corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age) )
Error in gls(distance ~ Sex * I(age - 11), Orthodont[sample(1:nrow(Orthodont), : function evaluation limit reached without convergence (9)
Missingness
When there is missingness, sorting does not help:
# Generate some missingness
Orthodont$missing <- sample(c(TRUE, FALSE),
size = nrow(Orthodont), prob = c(0.2, 0.8),
replace = TRUE)
fm4Orth.gls <- gls( distance ~ Sex * I(age - 11),
Orthodont, subset = !missing,
correlation = corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age) )
summary(fm4Orth.gls)$tTable
Value Std.Error t-value p-value
(Intercept) 25.05 0.506 49.5 1.4e-66
SexFemale -2.43 0.792 -3.1 2.9e-03
I(age - 11) 0.83 0.079 10.6 2.4e-17
SexFemale:I(age - 11) -0.33 0.122 -2.7 8.2e-03
fm5Orth.gls <- gls( distance ~ Sex * I(age - 11),
Orthodont[nrow(Orthodont):1, ], # reverse order
subset = !missing,
correlation = corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age) )
summary(fm5Orth.gls)$tTable
Value Std.Error t-value p-value
(Intercept) 25.00 0.500 50.0 6.0e-67
SexFemale -2.36 0.785 -3.0 3.5e-03
I(age - 11) 0.85 0.088 9.7 1.5e-15
SexFemale:I(age - 11) -0.35 0.131 -2.7 8.8e-03
The fix
By default, corSymm(form = ~ 1 | Subject) uses the order of observations to estimate the correlation components. In this case, the (i,j) component of the correlation matrix represents the correlation between the ith and jth observation. If instead we want the correlation between the ith and jth time point to be represented, we need to use corSymm(form = ~ t | Subject), where t are the consecutive integers indexing time.
Orthodont$t <- as.numeric(factor(Orthodont$age, levels = c(8, 10, 12, 14)))
with(Orthodont, table(t, age))
age
t 8 10 12 14
1 27 0 0 0
2 0 27 0 0
3 0 0 27 0
4 0 0 0 27
fm6Orth.gls <- gls( distance ~ Sex * I(age - 11),
Orthodont, subset = !missing,
correlation = corSymm(form = ~ t | Subject),
weights = varIdent(form = ~ 1 | age) )
summary(fm6Orth.gls)$tTable
Value Std.Error t-value p-value
(Intercept) 25.12 0.505 49.7 1.1e-66
SexFemale -2.54 0.789 -3.2 1.8e-03
I(age - 11) 0.82 0.094 8.7 1.8e-13
SexFemale:I(age - 11) -0.33 0.148 -2.2 3.1e-02
fm7Orth.gls <- gls( distance ~ Sex * I(age - 11),
Orthodont[nrow(Orthodont):1, ],
subset = !missing,
correlation = corSymm(form = ~ t | Subject),
weights = varIdent(form = ~ 1 | age) )
summary(fm7Orth.gls)$tTable
Value Std.Error t-value p-value
(Intercept) 25.12 0.505 49.7 1.1e-66
SexFemale -2.54 0.789 -3.2 1.8e-03
I(age - 11) 0.82 0.094 8.7 1.8e-13
SexFemale:I(age - 11) -0.33 0.148 -2.2 3.1e-02
Note this is not an issue with compound symmetry structure, corCompSymm, because the correlation parameters are exchangeable with respect to time.
comments powered by Disqus