missing data in the LMS and QML approaches
missing_lms_qml.Rmd
Options for Handling Missing Values
By default, missing values for both the LMS and QML are removed
list-wise (missing="listwise"
), such that only complete
observations are kept. It is also possible to set
missing="impute"
to impute missing values, or to use the
modsem_mimpute()
function to perform multiple imputation.
For the LMS approach it is also possible to use Full Information
Maximum Likelihood (FIML)
Full Information Maximum Likelihood (FIML)
If you’re using the LMS approach, it is possible to estimate your
model using Full Information Maximum Likelihood (FIML), by
setting missing="fiml"
. Here is an example, where we
generate some missing data using the oneInt
dataset.
set.seed(2834027) # set seed for reproducibility
n.missing <- 200L # generate 200 missing values
oneInt2 <- as.matrix(oneInt)
oneInt2[sample(length(oneInt2), n.missing)] <- NA
m1 <- '
# Outer Model
X =~ x1 + x2 + x3
Z =~ z1 + z2 + z3
Y =~ y1 + y2 + y3
# Inner Model
Y ~ X + Z + X:Z
'
lms_fiml <- modsem(m1, oneInt2, method = "lms", missing = "fiml")
summary(lms_fiml)
#>
#> modsem (1.0.12) ended normally after 45 iterations
#>
#> Estimator LMS
#> Optimization method EMA-NLMINB
#> Number of model parameters 31
#>
#> Number of observations 2000
#> Number of missing patterns 17
#>
#> Loglikelihood and Information Criteria:
#> Loglikelihood -17344.03
#> Akaike (AIC) 34750.06
#> Bayesian (BIC) 34923.69
#>
#> Numerical Integration:
#> Points of integration (per dim) 24
#> Dimensions 1
#> Total points of integration 24
#>
#> Fit Measures for Baseline Model (H0):
#> Loglikelihood -17679.3
#> Akaike (AIC) 35418.61
#> Bayesian (BIC) 35586.63
#> Chi-square 22.48
#> Degrees of Freedom (Chi-square) 24
#> P-value (Chi-square) 0.551
#> RMSEA 0.000
#>
#> Comparative Fit to H0 (LRT test):
#> Loglikelihood change 335.27
#> Difference test (D) 670.55
#> Degrees of freedom (D) 1
#> P-value (D) 0.000
#>
#> R-Squared Interaction Model (H1):
#> Y 0.598
#> R-Squared Baseline Model (H0):
#> Y 0.396
#> R-Squared Change (H1 - H0):
#> Y 0.202
#>
#> Parameter Estimates:
#> Coefficients unstandardized
#> Information observed
#> Standard errors standard
#>
#> Latent Variables:
#> Estimate Std.Error z.value P(>|z|)
#> X =~
#> x1 1.000
#> x2 0.802 0.013 63.743 0.000
#> x3 0.912 0.014 67.459 0.000
#> Z =~
#> z1 1.000
#> z2 0.813 0.013 64.323 0.000
#> z3 0.883 0.013 67.059 0.000
#> Y =~
#> y1 1.000
#> y2 0.798 0.008 106.264 0.000
#> y3 0.902 0.008 111.833 0.000
#>
#> Regressions:
#> Estimate Std.Error z.value P(>|z|)
#> Y ~
#> X 0.672 0.031 21.724 0.000
#> Z 0.569 0.030 18.668 0.000
#> X:Z 0.714 0.028 25.683 0.000
#>
#> Intercepts:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 1.021 0.024 42.607 0.000
#> .x2 1.216 0.020 60.807 0.000
#> .x3 0.919 0.022 41.326 0.000
#> .z1 1.012 0.024 41.545 0.000
#> .z2 1.206 0.020 59.198 0.000
#> .z3 0.916 0.022 42.039 0.000
#> .y1 1.038 0.033 31.385 0.000
#> .y2 1.220 0.027 45.364 0.000
#> .y3 0.951 0.030 31.626 0.000
#>
#> Covariances:
#> Estimate Std.Error z.value P(>|z|)
#> X ~~
#> Z 0.202 0.024 8.279 0.000
#>
#> Variances:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 0.155 0.009 17.737 0.000
#> .x2 0.161 0.007 22.982 0.000
#> .x3 0.165 0.008 20.654 0.000
#> .z1 0.169 0.009 18.478 0.000
#> .z2 0.160 0.007 22.449 0.000
#> .z3 0.157 0.008 20.488 0.000
#> .y1 0.160 0.009 17.845 0.000
#> .y2 0.155 0.007 22.440 0.000
#> .y3 0.162 0.008 20.303 0.000
#> X 0.988 0.037 26.889 0.000
#> Z 1.014 0.038 26.784 0.000
#> .Y 0.979 0.038 25.849 0.000
(Multiple) Imputation
If you’re using the QML approach it is not possible (yet) to use FIML, and FIML can also be very computationally intensive when using the LMS approach. Another option is to use (multiple) imputation.
If missing="impute"
is set, single imputation is
performed.
qml_impute <- modsem(m1, oneInt2, method = "qml", missing = "impute")
#> Imputing missing values. Consider using the `modsem_mimpute()` function!
summary(qml_impute)
#>
#> modsem (1.0.12) ended normally after 104 iterations
#>
#> Estimator QML
#> Optimization method NLMINB
#> Number of model parameters 31
#>
#> Number of observations 2000
#>
#> Loglikelihood and Information Criteria:
#> Loglikelihood -17495.78
#> Akaike (AIC) 35053.55
#> Bayesian (BIC) 35227.18
#>
#> Fit Measures for Baseline Model (H0):
#> Loglikelihood -17831.49
#> Akaike (AIC) 35722.97
#> Bayesian (BIC) 35891
#> Chi-square 17.82
#> Degrees of Freedom (Chi-square) 24
#> P-value (Chi-square) 0.811
#> RMSEA 0.000
#>
#> Comparative Fit to H0 (LRT test):
#> Loglikelihood change 335.71
#> Difference test (D) 671.42
#> Degrees of freedom (D) 1
#> P-value (D) 0.000
#>
#> R-Squared Interaction Model (H1):
#> Y 0.598
#> R-Squared Baseline Model (H0):
#> Y 0.396
#> R-Squared Change (H1 - H0):
#> Y 0.202
#>
#> Parameter Estimates:
#> Coefficients unstandardized
#> Information observed
#> Standard errors standard
#>
#> Latent Variables:
#> Estimate Std.Error z.value P(>|z|)
#> X =~
#> x1 1.000
#> x2 0.800 0.012 64.172 0.000
#> x3 0.912 0.013 68.452 0.000
#> Z =~
#> z1 1.000
#> z2 0.810 0.012 64.968 0.000
#> z3 0.882 0.013 67.682 0.000
#> Y =~
#> y1 1.000
#> y2 0.796 0.007 106.801 0.000
#> y3 0.902 0.008 112.544 0.000
#>
#> Regressions:
#> Estimate Std.Error z.value P(>|z|)
#> Y ~
#> X 0.673 0.031 21.741 0.000
#> Z 0.567 0.030 18.665 0.000
#> X:Z 0.713 0.028 25.783 0.000
#>
#> Intercepts:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 1.021 0.024 42.682 0.000
#> .x2 1.216 0.020 60.970 0.000
#> .x3 0.919 0.022 41.371 0.000
#> .z1 1.013 0.024 41.623 0.000
#> .z2 1.206 0.020 59.240 0.000
#> .z3 0.916 0.022 42.055 0.000
#> .y1 1.040 0.033 31.444 0.000
#> .y2 1.219 0.027 45.439 0.000
#> .y3 0.953 0.030 31.657 0.000
#>
#> Covariances:
#> Estimate Std.Error z.value P(>|z|)
#> X ~~
#> Z 0.202 0.024 8.294 0.000
#>
#> Variances:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 0.155 0.009 17.946 0.000
#> .x2 0.163 0.007 23.325 0.000
#> .x3 0.162 0.008 20.694 0.000
#> .z1 0.166 0.009 18.431 0.000
#> .z2 0.161 0.007 22.759 0.000
#> .z3 0.158 0.008 20.724 0.000
#> .y1 0.160 0.009 17.974 0.000
#> .y2 0.157 0.007 22.808 0.000
#> .y3 0.164 0.008 20.567 0.000
#> X 0.989 0.037 27.074 0.000
#> Z 1.017 0.038 26.942 0.000
#> .Y 0.985 0.038 25.956 0.000
A better option than single imputation, is multiple imputation, which
can be performed both for the LMS and QML approaches, using the
modsem_mimpute()
function.
lms_mimpute <- modsem_mimpute(m1, oneInt2, method = "lms")
summary(lms_mimpute)
#>
#> modsem (1.0.12) ended normally after 360 iterations
#>
#> Estimator LMS
#> Optimization method EMA-NLMINB
#> Number of model parameters 31
#>
#> Number of observations 2000
#>
#> Loglikelihood and Information Criteria:
#> Loglikelihood -17483.67
#> Akaike (AIC) 35029.33
#> Bayesian (BIC) 35202.96
#>
#> Numerical Integration:
#> Points of integration (per dim) 24
#> Dimensions 1
#> Total points of integration 24
#>
#> Fit Measures for Baseline Model (H0):
#> Loglikelihood -17813.52
#> Akaike (AIC) 35687.05
#> Bayesian (BIC) 35855.08
#> Chi-square 17.53
#> Degrees of Freedom (Chi-square) 24
#> P-value (Chi-square) 0.825
#> RMSEA 0.000
#>
#> Comparative Fit to H0 (LRT test):
#> Loglikelihood change 329.86
#> Difference test (D) 659.72
#> Degrees of freedom (D) 1
#> P-value (D) 0.000
#>
#> R-Squared Interaction Model (H1):
#> Y 0.596
#> R-Squared Baseline Model (H0):
#> Y 0.398
#> R-Squared Change (H1 - H0):
#> Y 0.198
#>
#> Parameter Estimates:
#> Coefficients unstandardized
#> Information Rubin-corrected (m=25)
#> Standard errors standard
#>
#> Latent Variables:
#> Estimate Std.Error z.value P(>|z|)
#> X =~
#> x1 1.000
#> x2 0.802 0.013 63.625 0.000
#> x3 0.912 0.014 67.421 0.000
#> Z =~
#> z1 1.000
#> z2 0.813 0.013 64.480 0.000
#> z3 0.883 0.013 67.102 0.000
#> Y =~
#> y1 1.000
#> y2 0.798 0.008 106.192 0.000
#> y3 0.902 0.008 111.394 0.000
#>
#> Regressions:
#> Estimate Std.Error z.value P(>|z|)
#> Y ~
#> X 0.671 0.031 21.699 0.000
#> Z 0.568 0.030 18.698 0.000
#> X:Z 0.711 0.028 25.580 0.000
#>
#> Intercepts:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 1.019 0.024 42.575 0.000
#> .x2 1.215 0.020 60.773 0.000
#> .x3 0.918 0.022 41.233 0.000
#> .z1 1.010 0.024 41.517 0.000
#> .z2 1.205 0.020 59.183 0.000
#> .z3 0.915 0.022 41.936 0.000
#> .y1 1.035 0.033 31.378 0.000
#> .y2 1.219 0.027 45.358 0.000
#> .y3 0.949 0.030 31.601 0.000
#>
#> Covariances:
#> Estimate Std.Error z.value P(>|z|)
#> X ~~
#> Z 0.202 0.024 8.312 0.000
#>
#> Variances:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 0.156 0.009 17.766 0.000
#> .x2 0.161 0.007 22.938 0.000
#> .x3 0.165 0.008 20.517 0.000
#> .z1 0.169 0.009 18.507 0.000
#> .z2 0.159 0.007 22.466 0.000
#> .z3 0.157 0.008 20.544 0.000
#> .y1 0.160 0.009 17.880 0.000
#> .y2 0.155 0.007 22.491 0.000
#> .y3 0.162 0.008 20.174 0.000
#> X 0.986 0.037 27.025 0.000
#> Z 1.014 0.038 26.768 0.000
#> .Y 0.984 0.038 26.019 0.000