missing data in the LMS and QML approaches • modsem

library(modsem)
#> This is modsem (1.0.14). Please report any bugs!

Options for Handling Missing Values

By default, missing values for both the LMS and QML are removed list-wise (missing="listwise"), such that only complete observations are kept. It is also possible to set missing="impute" to impute missing values, or to use the modsem_mimpute() function to perform multiple imputation. For the LMS approach it is also possible to use Full Information Maximum Likelihood (FIML)

Full Information Maximum Likelihood (FIML)

If you’re using the LMS approach, it is possible to estimate your model using Full Information Maximum Likelihood (FIML), by setting missing="fiml". Here is an example, where we generate some missing data using the oneInt dataset.

set.seed(2834027) # set seed for reproducibility
n.missing <- 200L # generate 200 missing values
oneInt2 <- as.matrix(oneInt)
oneInt2[sample(length(oneInt2), n.missing)] <- NA

m1 <- '
# Outer Model
  X =~ x1 + x2 + x3
  Z =~ z1 + z2 + z3
  Y =~ y1 + y2 + y3

# Inner Model
  Y ~ X + Z + X:Z
'

lms_fiml <- modsem(m1, oneInt2, method = "lms", missing = "fiml")
summary(lms_fiml)
#> 
#> modsem (1.0.14) ended normally after 25 iterations
#> 
#>   Estimator                                        LMS
#>   Optimization method                       EMA-NLMINB
#>   Number of model parameters                        31
#> 
#>   Number of observations                          2000
#>   Number of missing patterns                        17
#> 
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -17344.03
#>   Akaike (AIC)                                34750.06
#>   Bayesian (BIC)                              34923.69
#>  
#> Numerical Integration:
#>   Points of integration (per dim)                   24
#>   Dimensions                                         1
#>   Total points of integration                       24
#> 
#> Fit Measures for Baseline Model (H0):
#>                                               Standard
#>   Chi-square                                     22.48
#>   Degrees of Freedom (Chi-square)                   24
#>   P-value (Chi-square)                           0.551
#>   RMSEA                                          0.000
#>                                                       
#>   Loglikelihood                              -17679.30
#>   Akaike (AIC)                                35418.61
#>   Bayesian (BIC)                              35586.63
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                          335.27
#>   Difference test (D)                           670.55
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   Y                                              0.598
#> R-Squared Baseline Model (H0):
#>   Y                                              0.396
#> R-Squared Change (H1 - H0):
#>   Y                                              0.202
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                                 observed
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X =~          
#>     x1              1.000                             
#>     x2              0.802      0.013   63.737    0.000
#>     x3              0.911      0.014   67.452    0.000
#>   Z =~          
#>     z1              1.000                             
#>     z2              0.813      0.013   64.311    0.000
#>     z3              0.883      0.013   67.046    0.000
#>   Y =~          
#>     y1              1.000                             
#>     y2              0.798      0.008  106.242    0.000
#>     y3              0.902      0.008  111.810    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   Y ~           
#>     X               0.672      0.031   21.716    0.000
#>     Z               0.568      0.030   18.649    0.000
#>     X:Z             0.714      0.028   25.686    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              1.021      0.024   42.572    0.000
#>    .x2              1.215      0.020   60.769    0.000
#>    .x3              0.919      0.022   41.295    0.000
#>    .z1              1.012      0.024   41.531    0.000
#>    .z2              1.206      0.020   59.180    0.000
#>    .z3              0.915      0.022   42.023    0.000
#>    .y1              1.037      0.033   31.372    0.000
#>    .y2              1.220      0.027   45.348    0.000
#>    .y3              0.951      0.030   31.610    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X ~~          
#>     Z               0.202      0.024    8.279    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              0.155      0.009   17.729    0.000
#>    .x2              0.161      0.007   22.973    0.000
#>    .x3              0.165      0.008   20.648    0.000
#>    .z1              0.169      0.009   18.476    0.000
#>    .z2              0.160      0.007   22.443    0.000
#>    .z3              0.157      0.008   20.483    0.000
#>    .y1              0.160      0.009   17.844    0.000
#>    .y2              0.155      0.007   22.435    0.000
#>    .y3              0.162      0.008   20.298    0.000
#>     X               0.989      0.037   26.872    0.000
#>     Z               1.014      0.038   26.780    0.000
#>    .Y               0.979      0.038   25.847    0.000

(Multiple) Imputation

If you’re using the QML approach it is not possible (yet) to use FIML, and FIML can also be very computationally intensive when using the LMS approach. Another option is to use (multiple) imputation.

If missing="impute" is set, single imputation is performed.

qml_impute <- modsem(m1, oneInt2, method = "qml", missing = "impute")
#> Imputing missing values. Consider using the `modsem_mimpute()` function!
summary(qml_impute)
#> Imputing missing values. Consider using the `modsem_mimpute()` function!
#> 
#> modsem (1.0.14) ended normally after 55 iterations
#> 
#>   Estimator                                        QML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        31
#> 
#>   Number of observations                          2000
#> 
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -17495.78
#>   Akaike (AIC)                                35053.55
#>   Bayesian (BIC)                              35227.18
#>  
#> Fit Measures for Baseline Model (H0):
#>                                               Standard
#>   Chi-square                                     17.53
#>   Degrees of Freedom (Chi-square)                   24
#>   P-value (Chi-square)                           0.825
#>   RMSEA                                          0.000
#>                                                       
#>   Loglikelihood                              -17813.52
#>   Akaike (AIC)                                35687.05
#>   Bayesian (BIC)                              35855.08
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                          317.75
#>   Difference test (D)                           635.50
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   Y                                              0.597
#> R-Squared Baseline Model (H0):
#>   Y                                              0.398
#> R-Squared Change (H1 - H0):
#>   Y                                              0.199
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                                 observed
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X =~          
#>     x1              1.000                             
#>     x2              0.800      0.012   64.172    0.000
#>     x3              0.912      0.013   68.453    0.000
#>   Z =~          
#>     z1              1.000                             
#>     z2              0.810      0.012   64.967    0.000
#>     z3              0.882      0.013   67.682    0.000
#>   Y =~          
#>     y1              1.000                             
#>     y2              0.796      0.007  106.805    0.000
#>     y3              0.902      0.008  112.544    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   Y ~           
#>     X               0.672      0.031   21.729    0.000
#>     Z               0.567      0.030   18.653    0.000
#>     X:Z             0.713      0.028   25.786    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              1.020      0.024   42.661    0.000
#>    .x2              1.216      0.020   60.950    0.000
#>    .x3              0.918      0.022   41.350    0.000
#>    .z1              1.012      0.024   41.604    0.000
#>    .z2              1.205      0.020   59.222    0.000
#>    .z3              0.915      0.022   42.035    0.000
#>    .y1              1.039      0.033   31.420    0.000
#>    .y2              1.218      0.027   45.419    0.000
#>    .y3              0.952      0.030   31.634    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X ~~          
#>     Z               0.202      0.024    8.297    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              0.155      0.009   17.946    0.000
#>    .x2              0.163      0.007   23.325    0.000
#>    .x3              0.162      0.008   20.695    0.000
#>    .z1              0.166      0.009   18.432    0.000
#>    .z2              0.161      0.007   22.759    0.000
#>    .z3              0.158      0.008   20.724    0.000
#>    .y1              0.160      0.009   17.974    0.000
#>    .y2              0.157      0.007   22.807    0.000
#>    .y3              0.164      0.008   20.568    0.000
#>     X               0.989      0.037   27.074    0.000
#>     Z               1.017      0.038   26.941    0.000
#>    .Y               0.985      0.038   25.956    0.000

A better option than single imputation, is multiple imputation, which can be performed both for the LMS and QML approaches, using the modsem_mimpute() function.

lms_mimpute <- modsem_mimpute(m1, oneInt2, method = "lms")
summary(lms_mimpute)
#> 
#> modsem (1.0.14) ended normally after 355 iterations
#> 
#>   Estimator                                        LMS
#>   Optimization method                       EMA-NLMINB
#>   Number of model parameters                        31
#> 
#>   Number of observations                          2000
#> 
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -17481.58
#>   Akaike (AIC)                                35025.16
#>   Bayesian (BIC)                              35198.78
#>  
#> Numerical Integration:
#>   Points of integration (per dim)                   24
#>   Dimensions                                         1
#>   Total points of integration                       24
#> 
#> Fit Measures for Baseline Model (H0):
#>                                               Standard
#>   Chi-square                                     17.17
#>   Degrees of Freedom (Chi-square)                   24
#>   P-value (Chi-square)                           0.841
#>   RMSEA                                          0.000
#>                                                       
#>   Loglikelihood                              -17808.66
#>   Akaike (AIC)                                35677.33
#>   Bayesian (BIC)                              35845.35
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                          327.09
#>   Difference test (D)                           654.17
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   Y                                              0.595
#> R-Squared Baseline Model (H0):
#>   Y                                              0.396
#> R-Squared Change (H1 - H0):
#>   Y                                              0.199
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                   Rubin-corrected (m=25)
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X =~          
#>     x1              1.000                             
#>     x2              0.801      0.013   63.805    0.000
#>     x3              0.911      0.013   67.690    0.000
#>   Z =~          
#>     z1              1.000                             
#>     z2              0.813      0.013   64.437    0.000
#>     z3              0.883      0.013   67.275    0.000
#>   Y =~          
#>     y1              1.000                             
#>     y2              0.798      0.007  106.565    0.000
#>     y3              0.902      0.008  111.438    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   Y ~           
#>     X               0.671      0.031   21.646    0.000
#>     Z               0.569      0.030   18.681    0.000
#>     X:Z             0.711      0.028   25.490    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              1.020      0.024   42.621    0.000
#>    .x2              1.216      0.020   60.913    0.000
#>    .x3              0.919      0.022   41.359    0.000
#>    .z1              1.011      0.024   41.542    0.000
#>    .z2              1.206      0.020   59.164    0.000
#>    .z3              0.915      0.022   42.010    0.000
#>    .y1              1.037      0.033   31.360    0.000
#>    .y2              1.220      0.027   45.374    0.000
#>    .y3              0.951      0.030   31.600    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X ~~          
#>     Z               0.202      0.024    8.295    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              0.156      0.009   17.767    0.000
#>    .x2              0.161      0.007   23.107    0.000
#>    .x3              0.165      0.008   20.637    0.000
#>    .z1              0.170      0.009   18.511    0.000
#>    .z2              0.159      0.007   22.356    0.000
#>    .z3              0.157      0.008   20.536    0.000
#>    .y1              0.160      0.009   17.886    0.000
#>    .y2              0.155      0.007   22.534    0.000
#>    .y3              0.162      0.008   20.123    0.000
#>     X               0.988      0.037   27.032    0.000
#>     Z               1.014      0.038   26.731    0.000
#>    .Y               0.984      0.038   25.769    0.000