Interaction between latent variables using LMS and QML approaches
modsem_da.Rd
modsem_da()
is a function for estimating interaction effects between latent variables
in structural equation models (SEMs) using distributional analytic (DA) approaches.
Methods for estimating interaction effects in SEMs can basically be split into
two frameworks:
1. Product Indicator-based approaches ("dblcent"
, "rca"
, "uca"
,
"ca"
, "pind"
)
2. Distributionally based approaches ("lms"
, "qml"
).
modsem_da()
handles the latter and can estimate models using both QML and LMS,
necessary syntax, and variables for the estimation of models with latent product indicators.
NOTE: Run default_settings_da
to see default arguments.
Usage
modsem_da(
model.syntax = NULL,
data = NULL,
method = "lms",
verbose = NULL,
optimize = NULL,
nodes = NULL,
impute.na = NULL,
convergence.abs = NULL,
convergence.rel = NULL,
optimizer = NULL,
center.data = NULL,
standardize.data = NULL,
standardize.out = NULL,
standardize = NULL,
mean.observed = NULL,
cov.syntax = NULL,
double = NULL,
calc.se = NULL,
FIM = NULL,
EFIM.S = NULL,
OFIM.hessian = NULL,
EFIM.parametric = NULL,
robust.se = NULL,
R.max = NULL,
max.iter = NULL,
max.step = NULL,
start = NULL,
epsilon = NULL,
quad.range = NULL,
adaptive.quad = NULL,
adaptive.frequency = NULL,
n.threads = NULL,
algorithm = NULL,
em.control = NULL,
rcs = FALSE,
rcs.choose = NULL,
orthogonal.x = NULL,
orthogonal.y = NULL,
auto.fix.first = NULL,
auto.fix.single = NULL,
auto.split.syntax = NULL,
...
)
Arguments
- model.syntax
lavaan
syntax- data
A dataframe with observed variables used in the model.
- method
method to use:
"lms"
latent model structural equations (not passed to
lavaan
)."qml"
quasi maximum likelihood estimation of latent model structural equations (not passed to
lavaan
).
- verbose
should estimation progress be shown
- optimize
should starting parameters be optimized
- nodes
number of quadrature nodes (points of integration) used in
lms
, increased number gives better estimates but slower computation. How many are needed depends on the complexity of the model. For simple models, somewhere between 16-24 nodes should be enough; for more complex models, higher numbers may be needed. For models where there is an interaction effect between an endogenous and exogenous variable, the number of nodes should be at least 32, but practically (e.g., ordinal/skewed data), more than 32 is recommended. In cases where data is non-normal, it might be better to use theqml
approach instead. You can also consider settingadaptive.quad = TRUE
.- impute.na
Should missing values be imputed? If
FALSE
missing values are removed case-wise. IfTRUE
values are imputed usingAmelia::amelia
. Default isFALSE
. If you want more fine-grained control over how the missing data is imputed, you should consider imputing it yourself.- convergence.abs
Absolute convergence criterion. Lower values give better estimates but slower computation. Not relevant when using the QML approach. For the LMS approach the EM-algorithm stops whenever the relative or absolute convergence criterion is reached.
- convergence.rel
Relative convergence criterion. Lower values give better estimates but slower computation. For the LMS approach the EM-algorithm stops whenever the relative or absolute convergence criterion is reached.
- optimizer
optimizer to use, can be either
"nlminb"
or"L-BFGS-B"
. For LMS,"nlminb"
is recommended. For QML,"L-BFGS-B"
may be faster if there is a large number of iterations, but slower if there are few iterations.- center.data
should data be centered before fitting model
- standardize.data
should data be scaled before fitting model, will be overridden by
standardize
ifstandardize
is set toTRUE
.NOTE: It is recommended that you estimate the model normally and then standardize the output using
standardize_model
standardized_estimates
,summary(<modsem_da-object>, standardize=TRUE)
- standardize.out
should output be standardized (note will alter the relationships of parameter constraints since parameters are scaled unevenly, even if they have the same label). This does not alter the estimation of the model, only the output.
NOTE: It is recommended that you estimate the model normally and then standardize the output using
standardized_estimates
.- standardize
will standardize the data before fitting the model, remove the mean structure of the observed variables, and standardize the output. Note that
standardize.data
,mean.observed
, andstandardize.out
will be overridden bystandardize
ifstandardize
is set toTRUE
.NOTE: It is recommended that you estimate the model normally and then standardize the output using
standardized_estimates
.- mean.observed
should the mean structure of the observed variables be estimated? This will be overridden by
standardize
ifstandardize
is set toTRUE
.NOTE: Not recommended unless you know what you are doing.
- cov.syntax
model syntax for implied covariance matrix (see
vignette("interaction_two_etas", "modsem")
)- double
try to double the number of dimensions of integration used in LMS, this will be extremely slow but should be more similar to
mplus
.- calc.se
should standard errors be computed? NOTE: If
FALSE
, the information matrix will not be computed either.- FIM
should the Fisher information matrix be calculated using the observed or expected values? Must be either
"observed"
or"expected"
.- EFIM.S
if the expected Fisher information matrix is computed,
EFIM.S
selects the number of Monte Carlo samples. Defaults to 100. NOTE: This number should likely be increased for better estimates (e.g., 1000), but it might drasticly increase computation time.- OFIM.hessian
Logical. If
TRUE
(default) standard errors are based on the negative Hessian (observed Fisher information). IfFALSE
they come from the outer product of individual score vectors (OPG). For correctly specified models, these two matrices are asymptotically equivalent; yielding nearly identical standard errors in large samples. The Hessian usually shows smaller finite-sample variance (i.e., it's more consistent), and is therefore the default.Note, that the Hessian is not always positive definite, and is more computationally expensive to calculate. The OPG should always be positive definite, and a lot faster to compute. If the model is correctly specified, and the sample size is large, then the two should yield similar results, and switching to the OPG can save a lot of time. Note, that the required sample size depends on the complexity of the model.
A large difference between Hessian and OPG suggests misspecification, and
robust.se = TRUE
should be set to obtain sandwich (robust) standard errors.- EFIM.parametric
should data for calculating the expected Fisher information matrix be simulated parametrically (simulated based on the assumptions and implied parameters from the model), or non-parametrically (stochastically sampled)? If you believe that normality assumptions are violated,
EFIM.parametric = FALSE
might be the better option.- robust.se
should robust standard errors be computed, using the sandwich estimator?
- R.max
Maximum population size (not sample size) used in the calculated of the expected fischer information matrix.
- max.iter
maximum number of iterations.
- max.step
maximum steps for the M-step in the EM algorithm (LMS).
- start
starting parameters.
- epsilon
finite difference for numerical derivatives.
- quad.range
range in z-scores to perform numerical integration in LMS using, when using quasi-adaptive Gaussian-Hermite Quadratures. By default
Inf
, such thatf(t)
is integrated from -Inf to Inf, but this will likely be inefficient and pointless at a large number of nodes. Nodes outside+/- quad.range
will be ignored.- adaptive.quad
should a quasi adaptive quadrature be used? If
TRUE
, the quadrature nodes will be adapted to the data. IfFALSE
, the quadrature nodes will be fixed. Default isFALSE
. The adaptive quadrature does not fit an adaptive quadrature to each participant, but instead tries to place more nodes where posterior distribution is highest. Compared with a fixed Gauss Hermite quadrature this usually means that less nodes are placed at the tails of the distribution.- adaptive.frequency
How often should the quasi-adaptive quadrature be calculated? Defaults to 3, meaning that it is recalculated every third EM-iteration.
- n.threads
number of threads to use for parallel processing. If
NULL
, it will use <= 2 threads. If an integer is specified, it will use that number of threads (e.g.,n.threads = 4
will use 4 threads). If"default"
, it will use the default number of threads (2). If"max"
, it will use all available threads,"min"
will use 1 thread.- algorithm
algorithm to use for the EM algorithm. Can be either
"EM"
or"EMA"
."EM"
is the standard EM algorithm."EMA"
is an accelerated EM procedure that uses Quasi-Newton and Fisher Scoring optimization steps when needed. Default is"EM"
.- em.control
a list of control parameters for the EM algorithm. See
default_settings_da
for defaults.- rcs
Should latent variable indicators be replaced with reliablity-corrected single item indicators instead? See
relcorr_single_item
.- rcs.choose
Which latent variables should get their indicators replaced with reliablity-reliability corrected single items? Corresponds to the
choose
argument inrelcorr_single_item
.- orthogonal.x
If
TRUE
, all covariances among exogenous latent variables only are set to zero. Default isFALSE
.- orthogonal.y
If
TRUE
, all covariances among endogenous latent variables only are set to zero. IfFALSE
residual covariances are added between pure endogenous variables; those that are predicted by no other endogenous variable in the structural model. Default isFALSE
.- auto.fix.first
If
TRUE
the factor loading of the first indicator, for a given latent variable is fixed to1
. IfFALSE
no loadings are fixed (automatically). Note that that this might make it such that the model no longer is identified. Default isTRUE
. NOTE this behaviour is overridden if the first loading is labelled, where it gets treated as a free parameter instead. This differs from the default behaviour inlavaan
.- auto.fix.single
If
TRUE
, the residual variance of an observed indicator is set to zero if it is the only indicator of a latent variable. IfFALSE
the residual variance is not fixed to zero, and treated as a free parameter of the model. Default ifTRUE
. NOTE this behaviour is overridden if the first loading is labelled, where it gets treated as a free parameter instead.- auto.split.syntax
Should the model syntax automatically be split into a linear and non-linear part? This is done by moving the structural model for linear endogenous variables (used in interaction terms) into the
cov.syntax
argument. This can potentially allow interactions between two endogenous variables given that both are linear (i.e., not affected by interaction terms). This isFALSE
by default for the LMS approach. When using the QML approach interation effects between exogenous and endogenous variables can in some cases be biased, if the model is not split beforehand. The default is thereforeTRUE
for the QML approach.- ...
additional arguments to be passed to the estimation function.
Examples
library(modsem)
# For more examples, check README and/or GitHub.
# One interaction
m1 <- "
# Outer Model
X =~ x1 + x2 +x3
Y =~ y1 + y2 + y3
Z =~ z1 + z2 + z3
# Inner model
Y ~ X + Z + X:Z
"
# \dontrun{
# QML Approach
est_qml <- modsem_da(m1, oneInt, method = "qml")
summary(est_qml)
#>
#> modsem (version 1.0.11):
#>
#> Estimator QML
#> Optimization method NLMINB
#> Number of observations 2000
#> Number of iterations 103
#> Loglikelihood -17501.61
#> Akaike (AIC) 35065.21
#> Bayesian (BIC) 35238.84
#>
#> Fit Measures for Baseline Model (H0):
#> Loglikelihood -17831.87
#> Akaike (AIC) 35723.75
#> Bayesian (BIC) 35891.78
#> Chi-square 17.52
#> Degrees of Freedom (Chi-square) 24
#> P-value (Chi-square) 0.826
#> RMSEA 0.000
#>
#> Comparative Fit to H0 (LRT test):
#> Loglikelihood change 330.27
#> Difference test (D) 660.54
#> Degrees of freedom (D) 1
#> P-value (D) 0.000
#>
#> R-Squared Interaction Model (H1):
#> Y 0.617
#> R-Squared Baseline Model (H0):
#> Y 0.395
#> R-Squared Change (H1 - H0):
#> Y 0.222
#>
#> Parameter Estimates:
#> Coefficients unstandardized
#> Information observed
#> Standard errors standard
#>
#> Latent Variables:
#> Estimate Std.Error z.value P(>|z|)
#> X =~
#> x1 1.000
#> x2 0.803 0.013 63.96 0.000
#> x3 0.913 0.013 67.77 0.000
#> Z =~
#> z1 1.000
#> z2 0.810 0.012 65.15 0.000
#> z3 0.881 0.013 67.67 0.000
#> Y =~
#> y1 1.000
#> y2 0.798 0.007 107.56 0.000
#> y3 0.899 0.008 112.52 0.000
#>
#> Regressions:
#> Estimate Std.Error z.value P(>|z|)
#> Y ~
#> X 0.673 0.031 21.89 0.000
#> Z 0.566 0.030 18.82 0.000
#> X:Z 0.713 0.027 26.50 0.000
#>
#> Intercepts:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 1.023 0.024 42.84 0.000
#> .x2 1.216 0.020 60.92 0.000
#> .x3 0.920 0.022 41.44 0.000
#> .z1 1.012 0.024 41.58 0.000
#> .z2 1.206 0.020 59.27 0.000
#> .z3 0.916 0.022 42.06 0.000
#> .y1 1.038 0.033 31.31 0.000
#> .y2 1.221 0.027 45.29 0.000
#> .y3 0.954 0.030 31.72 0.000
#>
#> Covariances:
#> Estimate Std.Error z.value P(>|z|)
#> X ~~
#> Z 0.200 0.024 8.24 0.000
#>
#> Variances:
#> Estimate Std.Error z.value P(>|z|)
#> .x1 0.158 0.009 18.12 0.000
#> .x2 0.162 0.007 23.18 0.000
#> .x3 0.165 0.008 20.83 0.000
#> .z1 0.166 0.009 18.46 0.000
#> .z2 0.160 0.007 22.69 0.000
#> .z3 0.158 0.008 20.81 0.000
#> .y1 0.159 0.009 17.98 0.000
#> .y2 0.154 0.007 22.66 0.000
#> .y3 0.164 0.008 20.69 0.000
#> X 0.983 0.036 26.99 0.000
#> Z 1.018 0.038 26.94 0.000
#> .Y 0.901 0.040 22.70 0.000
#>
# Theory Of Planned Behavior
tpb <- "
# Outer Model (Based on Hagger et al., 2007)
ATT =~ att1 + att2 + att3 + att4 + att5
SN =~ sn1 + sn2
PBC =~ pbc1 + pbc2 + pbc3
INT =~ int1 + int2 + int3
BEH =~ b1 + b2
# Inner Model (Based on Steinmetz et al., 2011)
INT ~ ATT + SN + PBC
BEH ~ INT + PBC
BEH ~ INT:PBC
"
# LMS Approach
est_lms <- modsem_da(tpb, data = TPB, method = "lms")
#> Warning: It is recommended that you have at least 32 nodes for interaction effects between exogenous and endogenous variables in the lms approach 'nodes = 24'
summary(est_lms)
#>
#> modsem (version 1.0.11):
#>
#> Estimator LMS
#> Optimization method EMA-NLMINB
#> Number of observations 2000
#> Number of iterations 46
#> Loglikelihood -23463.63
#> Akaike (AIC) 47035.27
#> Bayesian (BIC) 47337.72
#>
#> Numerical Integration:
#> Points of integration (per dim) 24
#> Dimensions 1
#> Total points of integration 24
#>
#> Fit Measures for Baseline Model (H0):
#> Loglikelihood -26393.22
#> Akaike (AIC) 52892.45
#> Bayesian (BIC) 53189.29
#> Chi-square 66.27
#> Degrees of Freedom (Chi-square) 82
#> P-value (Chi-square) 0.897
#> RMSEA 0.000
#>
#> Comparative Fit to H0 (LRT test):
#> Loglikelihood change 2929.59
#> Difference test (D) 5859.18
#> Degrees of freedom (D) 1
#> P-value (D) 0.000
#>
#> R-Squared Interaction Model (H1):
#> INT 0.361
#> BEH 0.248
#> R-Squared Baseline Model (H0):
#> INT 0.367
#> BEH 0.210
#> R-Squared Change (H1 - H0):
#> INT -0.006
#> BEH 0.038
#>
#> Parameter Estimates:
#> Coefficients unstandardized
#> Information observed
#> Standard errors standard
#>
#> Latent Variables:
#> Estimate Std.Error z.value P(>|z|)
#> PBC =~
#> pbc1 1.000
#> pbc2 0.911 0.013 69.09 0.000
#> pbc3 0.802 0.012 66.23 0.000
#> ATT =~
#> att1 1.000
#> att2 0.877 0.012 71.57 0.000
#> att3 0.789 0.012 66.39 0.000
#> att4 0.695 0.011 61.00 0.000
#> att5 0.887 0.013 70.86 0.000
#> SN =~
#> sn1 1.000
#> sn2 0.889 0.017 52.58 0.000
#> INT =~
#> int1 1.000
#> int2 0.913 0.015 59.06 0.000
#> int3 0.807 0.014 55.74 0.000
#> BEH =~
#> b1 1.000
#> b2 0.961 0.030 31.68 0.000
#>
#> Regressions:
#> Estimate Std.Error z.value P(>|z|)
#> INT ~
#> PBC 0.217 0.030 7.33 0.000
#> ATT 0.213 0.026 8.17 0.000
#> SN 0.177 0.028 6.40 0.000
#> BEH ~
#> PBC 0.228 0.022 10.18 0.000
#> INT 0.182 0.025 7.43 0.000
#> PBC:INT 0.204 0.018 11.24 0.000
#>
#> Intercepts:
#> Estimate Std.Error z.value P(>|z|)
#> .pbc1 0.960 0.021 45.57 0.000
#> .pbc2 0.950 0.020 48.20 0.000
#> .pbc3 0.961 0.018 54.12 0.000
#> .att1 0.987 0.023 42.92 0.000
#> .att2 0.983 0.020 48.09 0.000
#> .att3 0.995 0.019 52.70 0.000
#> .att4 0.980 0.017 56.96 0.000
#> .att5 0.968 0.021 46.72 0.000
#> .sn1 0.979 0.023 42.57 0.000
#> .sn2 0.986 0.021 47.84 0.000
#> .int1 0.995 0.021 47.42 0.000
#> .int2 0.995 0.020 50.93 0.000
#> .int3 0.990 0.018 55.38 0.000
#> .b1 0.989 0.021 47.13 0.000
#> .b2 1.008 0.020 51.30 0.000
#>
#> Covariances:
#> Estimate Std.Error z.value P(>|z|)
#> PBC ~~
#> ATT 0.658 0.026 25.64 0.000
#> SN 0.657 0.026 25.13 0.000
#> ATT ~~
#> SN 0.616 0.027 22.65 0.000
#>
#> Variances:
#> Estimate Std.Error z.value P(>|z|)
#> .pbc1 0.147 0.008 19.09 0.000
#> .pbc2 0.164 0.007 22.39 0.000
#> .pbc3 0.154 0.006 23.95 0.000
#> .att1 0.167 0.007 23.53 0.000
#> .att2 0.150 0.006 24.72 0.000
#> .att3 0.159 0.006 26.39 0.000
#> .att4 0.163 0.006 27.66 0.000
#> .att5 0.159 0.006 24.93 0.000
#> .sn1 0.178 0.015 12.10 0.000
#> .sn2 0.156 0.012 13.23 0.000
#> .int1 0.157 0.009 18.11 0.000
#> .int2 0.160 0.008 20.41 0.000
#> .int3 0.168 0.007 23.55 0.000
#> .b1 0.186 0.018 10.11 0.000
#> .b2 0.135 0.017 8.09 0.000
#> PBC 0.933 0.031 30.18 0.000
#> ATT 0.985 0.035 27.80 0.000
#> SN 0.974 0.038 25.93 0.000
#> .INT 0.491 0.020 24.64 0.000
#> .BEH 0.456 0.023 20.13 0.000
#>
# }