Interaction between latent variables using LMS and QML approaches

modsem_da() is a function for estimating interaction effects between latent variables in structural equation models (SEMs) using distributional analytic (DA) approaches. Methods for estimating interaction effects in SEMs can basically be split into two frameworks: 1. Product Indicator-based approaches ("dblcent", "rca", "uca", "ca", "pind") 2. Distributionally based approaches ("lms", "qml").

modsem_da() handles the latter and can estimate models using both QML and LMS, necessary syntax, and variables for the estimation of models with latent product indicators.

NOTE: Run default_settings_da to see default arguments.

Usage

modsem_da(
  model.syntax = NULL,
  data = NULL,
  method = "lms",
  verbose = NULL,
  optimize = NULL,
  nodes = NULL,
  missing = NULL,
  convergence.abs = NULL,
  convergence.rel = NULL,
  optimizer = NULL,
  center.data = NULL,
  standardize.data = NULL,
  standardize.out = NULL,
  standardize = NULL,
  mean.observed = NULL,
  cov.syntax = NULL,
  double = NULL,
  calc.se = NULL,
  FIM = NULL,
  EFIM.S = NULL,
  OFIM.hessian = NULL,
  EFIM.parametric = NULL,
  robust.se = NULL,
  R.max = NULL,
  max.iter = NULL,
  max.step = NULL,
  start = NULL,
  epsilon = NULL,
  quad.range = NULL,
  adaptive.quad = NULL,
  adaptive.frequency = NULL,
  adaptive.quad.tol = NULL,
  n.threads = NULL,
  algorithm = NULL,
  em.control = NULL,
  ordered = NULL,
  ordered.boot = 50,
  cluster = NULL,
  cr1s = FALSE,
  rcs = FALSE,
  rcs.choose = NULL,
  rcs.scale.corrected = TRUE,
  orthogonal.x = NULL,
  orthogonal.y = NULL,
  auto.fix.first = NULL,
  auto.fix.single = NULL,
  auto.split.syntax = NULL,
  ...
)

Arguments

model.syntax

lavaan syntax

data

A dataframe with observed variables used in the model.

method

method to use:

"lms": latent moderated structural equations (not passed to lavaan).
"qml": quasi maximum likelihood estimation (not passed to lavaan).

verbose

should estimation progress be shown

optimize

should starting parameters be optimized

nodes

number of quadrature nodes (points of integration) used in lms, increased number gives better estimates but slower computation. How many are needed depends on the complexity of the model. For simple models, somewhere between 16-24 nodes should be enough; for more complex models, higher numbers may be needed. For models where there is an interaction effect between an endogenous and exogenous variable, the number of nodes should be at least 32, but practically (e.g., ordinal/skewed data), more than 32 is recommended. In cases where data is non-normal, it might be better to use the qml approach instead. You can also consider setting adaptive.quad = TRUE.

missing

How should missing values be handled? If "listwise" (default) missing values are removed list-wise (alias: "complete" or "casewise"). If impute values are imputed using Amelia::amelia. If "fiml" (alias: "ml" or "direct"), full information maximum likelihood (FIML) is used. FIML can be (very) computationally intensive.

convergence.abs

Absolute convergence criterion. Lower values give better estimates but slower computation. Not relevant when using the QML approach. For the LMS approach the EM-algorithm stops whenever the relative or absolute convergence criterion is reached.

convergence.rel

Relative convergence criterion. Lower values give better estimates but slower computation. For the LMS approach the EM-algorithm stops whenever the relative or absolute convergence criterion is reached.

optimizer

optimizer to use, can be either "nlminb" or "L-BFGS-B". For LMS, "nlminb" is recommended. For QML, "L-BFGS-B" may be faster if there is a large number of iterations, but slower if there are few iterations.

center.data

should data be centered before fitting model

standardize.data

should data be scaled before fitting model, will be overridden by standardize if standardize is set to TRUE.

standardize.out

should output be standardized (note will alter the relationships of parameter constraints since parameters are scaled unevenly, even if they have the same label). This does not alter the estimation of the model, only the output.

NOTE: It is recommended that you estimate the model normally and then standardize the output using standardize_model, standardized_estimates or summary(<modsem_da-object>, standardize=TRUE).

standardize

will standardize the data before fitting the model, remove the mean structure of the observed variables, and standardize the output. Note that standardize.data, mean.observed, and standardize.out will be overridden by standardize if standardize is set to TRUE.

NOTE: It is recommended that you estimate the model normally and then standardize the output using standardized_estimates.

mean.observed

should the mean structure of the observed variables be estimated? This will be overridden by standardize, if standardize is set to TRUE.

NOTE: Not recommended unless you know what you are doing.

cov.syntax

model syntax for implied covariance matrix of exogenous latent variables (see vignette("interaction_two_etas", "modsem")).

double

try to double the number of dimensions of integration used in LMS, this will be extremely slow but should be more similar to mplus.

calc.se

should standard errors be computed? NOTE: If FALSE, the information matrix will not be computed either.

FIM

should the Fisher information matrix be calculated using the observed or expected values? Must be either "observed" or "expected".

EFIM.S

if the expected Fisher information matrix is computed, EFIM.S selects the number of Monte Carlo samples. Defaults to 100. NOTE: This number should likely be increased for better estimates (e.g., 1000), but it might drasticly increase computation time.

OFIM.hessian

Logical. If TRUE (default), standard errors are based on the negative Hessian (observed Fisher information). If FALSE, they come from the outer product of individual score vectors (OPG). For correctly specified models, these two matrices are asymptotically equivalent; yielding nearly identical standard errors in large samples. The Hessian usually shows smaller finite-sample variance (i.e., it's more consistent), and is therefore the default.

Note, that the Hessian is not always positive definite, and is more computationally expensive to calculate. The OPG should always be positive definite, and a lot faster to compute. If the model is correctly specified, and the sample size is large, then the two should yield similar results, and switching to the OPG can save a lot of time. Note, that the required sample size depends on the complexity of the model.

A large difference between Hessian and OPG suggests misspecification, and robust.se = TRUE should be set to obtain sandwich (robust) standard errors.

EFIM.parametric

should data for calculating the expected Fisher information matrix be simulated parametrically (simulated based on the assumptions and implied parameters from the model), or non-parametrically (stochastically sampled)? If you believe that normality assumptions are violated, EFIM.parametric = FALSE might be the better option.

robust.se

should robust standard errors be computed, using the sandwich estimator?

R.max

Maximum population size (not sample size) used in the calculated of the expected fischer information matrix.

max.iter

maximum number of iterations.

max.step

maximum steps for the M-step in the EM algorithm (LMS).

start

starting parameters.

epsilon

finite difference for numerical derivatives.

quad.range

range in z-scores to perform numerical integration in LMS using, when using quasi-adaptive Gaussian-Hermite Quadratures. By default Inf, such that f(t) is integrated from -Inf to Inf, but this will likely be inefficient and pointless at a large number of nodes. Nodes outside +/- quad.range will be ignored.

adaptive.quad

should a quasi adaptive quadrature be used? If TRUE, the quadrature nodes will be adapted to the data. If FALSE, the quadrature nodes will be fixed. Default is FALSE. The adaptive quadrature does not fit an adaptive quadrature to each participant, but instead tries to place more nodes where posterior distribution is highest. Compared with a fixed Gauss Hermite quadrature this usually means that less nodes are placed at the tails of the distribution.

adaptive.frequency

How often should the quasi-adaptive quadrature be calculated? Defaults to 3, meaning that it is recalculated every third EM-iteration.

adaptive.quad.tol

Relative error tolerance for quasi adaptive quadrature. Defaults to 1e-12.

n.threads

number of threads to use for parallel processing. If NULL, it will use <= 2 threads. If an integer is specified, it will use that number of threads (e.g., n.threads = 4 will use 4 threads). If "default", it will use the default number of threads (2). If "max", it will use all available threads, "min" will use 1 thread.

algorithm

algorithm to use for the EM algorithm. Can be either "EM" or "EMA". "EM" is the standard EM algorithm. "EMA" is an accelerated EM procedure that uses Quasi-Newton and Fisher Scoring optimization steps when needed. Default is "EM".

em.control

a list of control parameters for the EM algorithm. See default_settings_da for defaults.

ordered

Variables to be treated as ordered. Ordered (ordinal) variables are scale-corrected to adjust for unequal intervals before model estimation, by estimating the distributions of the underlying continuous response variables. The distributions of the response variables are estimated using a combination of bootstrapping and Monte Carlo simulations. For small differences in the spacing of the intervals, the point estimates should be unbiased. The standard errors may be underestimated though. Using robust standard errors may make the estimates of the standard errors more accurate.

ordered.boot

Number of bootstraps used to estimate the underlying continuous distribution of the ordinal variables.

cluster

Clusters used to compute standard errors robust to non-indepence of observations. Must be paired with robust.se = TRUE.

cr1s

Logical; if TRUE, apply the CR1S small-sample correction factor to the cluster-robust variance estimator. The CR1S factor is \((G / (G - 1)) \cdot ((N - 1) / (N - q))\), where \(G\) is the number of clusters, \(N\) is the total number of observations, and \(q\) is the number of free parameters. This adjustment inflates standard errors to reduce the small-sample downward bias present in the basic cluster-robust (CR0) estimator, especially when \(G\) is small. If FALSE, the unadjusted CR0 estimator is used. Defaults to TRUE. Only relevant if cluster is specified.

rcs

Should latent variable indicators be replaced with reliability-corrected single item indicators instead? See relcorr_single_item.

rcs.choose

Which latent variables should get their indicators replaced with reliability-corrected single items? It is passed to relcorr_single_item as the choose argument.

rcs.scale.corrected

Should reliability-corrected items be scale-corrected? If TRUE reliability-corrected single items are corrected for differences in factor loadings between the items. Default is TRUE.

orthogonal.x

If TRUE, all covariances among exogenous latent variables only are set to zero. Default is FALSE.

orthogonal.y

If TRUE, all covariances among endogenous latent variables only are set to zero. If FALSE residual covariances are added between pure endogenous variables; those that are predicted by no other endogenous variable in the structural model. Default is FALSE.

auto.fix.first

If TRUE the factor loading of the first indicator, for a given latent variable is fixed to 1. If FALSE no loadings are fixed (automatically). Note that that this might make it such that the model no longer is identified. Default is TRUE. NOTE this behaviour is overridden if the first loading is labelled, where it gets treated as a free parameter instead. This differs from the default behaviour in lavaan.

auto.fix.single

If TRUE, the residual variance of an observed indicator is set to zero if it is the only indicator of a latent variable. If FALSE the residual variance is not fixed to zero, and treated as a free parameter of the model. Default is TRUE. NOTE this behaviour is overridden if the first loading is labelled, where it gets treated as a free parameter instead.

auto.split.syntax

Should the model syntax automatically be split into a linear and non-linear part? This is done by moving the structural model for linear endogenous variables (used in interaction terms) into the cov.syntax argument. This can potentially allow interactions between two endogenous variables given that both are linear (i.e., not affected by interaction terms). This is FALSE by default for the LMS approach. When using the QML approach interation effects between exogenous and endogenous variables can in some cases be biased, if the model is not split beforehand. The default is therefore TRUE for the QML approach.

...

additional arguments to be passed to the estimation function.

Value

modsem_da object

Examples

library(modsem)
# For more examples, check README and/or GitHub.
# One interaction
m1 <- "
  # Outer Model
  X =~ x1 + x2 +x3
  Y =~ y1 + y2 + y3
  Z =~ z1 + z2 + z3

  # Inner model
  Y ~ X + Z + X:Z
"

# \dontrun{
# QML Approach
est_qml <- modsem_da(m1, oneInt, method = "qml")
summary(est_qml)
#> 
#> modsem (1.0.12) ended normally after 99 iterations
#> 
#>   Estimator                                        QML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        31
#>                                                       
#>   Number of observations                          2000
#>  
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -17493.65
#>   Akaike (AIC)                                35049.29
#>   Bayesian (BIC)                              35222.92
#>  
#> Fit Measures for Baseline Model (H0):
#>   Loglikelihood                              -17831.87
#>   Akaike (AIC)                                35723.75
#>   Bayesian (BIC)                              35891.78
#>   Chi-square                                     17.52
#>   Degrees of Freedom (Chi-square)                   24
#>   P-value (Chi-square)                           0.826
#>   RMSEA                                          0.000
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                          338.23
#>   Difference test (D)                           676.45
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   Y                                              0.599
#> R-Squared Baseline Model (H0):
#>   Y                                              0.395
#> R-Squared Change (H1 - H0):
#>   Y                                              0.204
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                                 observed
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X =~          
#>     x1              1.000                             
#>     x2              0.804      0.013   63.897    0.000
#>     x3              0.914      0.013   67.710    0.000
#>   Z =~          
#>     z1              1.000                             
#>     z2              0.810      0.012   65.077    0.000
#>     z3              0.881      0.013   67.605    0.000
#>   Y =~          
#>     y1              1.000                             
#>     y2              0.798      0.007  107.528    0.000
#>     y3              0.899      0.008  112.556    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   Y ~           
#>     X               0.674      0.031   21.684    0.000
#>     Z               0.569      0.030   18.735    0.000
#>     X:Z             0.719      0.028   25.831    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              1.023      0.024   42.843    0.000
#>    .x2              1.216      0.020   60.923    0.000
#>    .x3              0.920      0.022   41.437    0.000
#>    .z1              1.012      0.024   41.577    0.000
#>    .z2              1.206      0.020   59.270    0.000
#>    .z3              0.916      0.022   42.064    0.000
#>    .y1              1.037      0.033   31.413    0.000
#>    .y2              1.221      0.027   45.434    0.000
#>    .y3              0.954      0.030   31.817    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   X ~~          
#>     Z               0.200      0.024    8.240    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .x1              0.158      0.009   18.145    0.000
#>    .x2              0.162      0.007   23.154    0.000
#>    .x3              0.165      0.008   20.767    0.000
#>    .z1              0.167      0.009   18.507    0.000
#>    .z2              0.160      0.007   22.676    0.000
#>    .z3              0.158      0.008   20.770    0.000
#>    .y1              0.160      0.009   18.010    0.000
#>    .y2              0.154      0.007   22.679    0.000
#>    .y3              0.164      0.008   20.678    0.000
#>     X               0.983      0.036   26.974    0.000
#>     Z               1.017      0.038   26.927    0.000
#>    .Y               0.980      0.038   25.909    0.000
#> 

# Theory Of Planned Behavior
tpb <- "
# Outer Model (Based on Hagger et al., 2007)
  ATT =~ att1 + att2 + att3 + att4 + att5
  SN =~ sn1 + sn2
  PBC =~ pbc1 + pbc2 + pbc3
  INT =~ int1 + int2 + int3
  BEH =~ b1 + b2

# Inner Model (Based on Steinmetz et al., 2011)
  INT ~ ATT + SN + PBC
  BEH ~ INT + PBC
  BEH ~ INT:PBC
"

# LMS Approach
est_lms <- modsem_da(tpb, data = TPB, method = "lms")
#> Warning: It is recommended that you have at least 32 nodes for interaction effects between exogenous and endogenous variables in the lms approach 'nodes = 24'
summary(est_lms)
#> 
#> modsem (1.0.12) ended normally after 27 iterations
#> 
#>   Estimator                                        LMS
#>   Optimization method                       EMA-NLMINB
#>   Number of model parameters                        54
#>                                                       
#>   Number of observations                          2000
#>  
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -26326.10
#>   Akaike (AIC)                                52760.20
#>   Bayesian (BIC)                              53062.65
#>  
#> Numerical Integration:
#>   Points of integration (per dim)                   24
#>   Dimensions                                         1
#>   Total points of integration                       24
#>  
#> Fit Measures for Baseline Model (H0):
#>   Loglikelihood                              -26393.22
#>   Akaike (AIC)                                52892.45
#>   Bayesian (BIC)                              53189.29
#>   Chi-square                                     66.27
#>   Degrees of Freedom (Chi-square)                   82
#>   P-value (Chi-square)                           0.897
#>   RMSEA                                          0.000
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                           67.12
#>   Difference test (D)                           134.24
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   INT                                            0.366
#>   BEH                                            0.263
#> R-Squared Baseline Model (H0):
#>   INT                                            0.367
#>   BEH                                            0.210
#> R-Squared Change (H1 - H0):
#>   INT                                           -0.001
#>   BEH                                            0.053
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                                 observed
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT =~        
#>     att1            1.000                             
#>     att2            0.878      0.012   71.568    0.000
#>     att3            0.789      0.012   66.377    0.000
#>     att4            0.695      0.011   61.003    0.000
#>     att5            0.887      0.013   70.856    0.000
#>   SN =~         
#>     sn1             1.000                             
#>     sn2             0.888      0.017   52.599    0.000
#>   PBC =~        
#>     pbc1            1.000                             
#>     pbc2            0.912      0.013   69.337    0.000
#>     pbc3            0.801      0.012   65.982    0.000
#>   INT =~        
#>     int1            1.000                             
#>     int2            0.913      0.015   59.060    0.000
#>     int3            0.807      0.014   55.741    0.000
#>   BEH =~        
#>     b1              1.000                             
#>     b2              0.961      0.030   31.747    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   INT ~         
#>     ATT             0.213      0.026    8.170    0.000
#>     SN              0.177      0.028    6.391    0.000
#>     PBC             0.217      0.030    7.341    0.000
#>   BEH ~         
#>     PBC             0.233      0.022   10.413    0.000
#>     INT             0.190      0.025    7.692    0.000
#>     INT:PBC         0.205      0.018   11.307    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .pbc1            0.999      0.024   42.326    0.000
#>    .pbc2            0.986      0.022   44.844    0.000
#>    .pbc3            0.992      0.020   50.353    0.000
#>    .att1            1.015      0.024   41.988    0.000
#>    .att2            1.008      0.021   46.949    0.000
#>    .att3            1.017      0.020   51.432    0.000
#>    .att4            1.000      0.018   55.629    0.000
#>    .att5            0.993      0.022   45.656    0.000
#>    .sn1             1.007      0.024   41.640    0.000
#>    .sn2             1.011      0.022   46.683    0.000
#>    .int1            1.014      0.022   46.949    0.000
#>    .int2            1.013      0.020   50.392    0.000
#>    .int3            1.006      0.018   54.791    0.000
#>    .b1              1.001      0.021   46.969    0.000
#>    .b2              1.019      0.020   51.041    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT ~~        
#>     SN              0.629      0.029   21.899    0.000
#>   PBC ~~        
#>     ATT             0.677      0.029   23.622    0.000
#>     SN              0.677      0.029   23.234    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .pbc1            0.144      0.008   18.373    0.000
#>    .pbc2            0.160      0.007   21.442    0.000
#>    .pbc3            0.155      0.006   23.881    0.000
#>    .att1            0.167      0.007   23.536    0.000
#>    .att2            0.150      0.006   24.722    0.000
#>    .att3            0.159      0.006   26.385    0.000
#>    .att4            0.162      0.006   27.653    0.000
#>    .att5            0.159      0.006   24.934    0.000
#>    .sn1             0.178      0.015   12.088    0.000
#>    .sn2             0.157      0.012   13.254    0.000
#>    .int1            0.157      0.009   18.110    0.000
#>    .int2            0.160      0.008   20.413    0.000
#>    .int3            0.168      0.007   23.555    0.000
#>    .b1              0.186      0.018   10.129    0.000
#>    .b2              0.135      0.017    8.120    0.000
#>     ATT             0.998      0.037   27.106    0.000
#>     SN              0.987      0.039   25.379    0.000
#>     PBC             0.961      0.035   27.094    0.000
#>    .INT             0.491      0.020   24.650    0.000
#>    .BEH             0.455      0.023   20.138    0.000
#> 
# }