interaction effects between endogenous variables
interaction_two_etas.Rmd
The Problem
Interaction effects between two endogenous (i.e., dependent)
variables work as you would expect for the product indicator methods
("dblcent"
, "rca"
, "ca"
,
"uca"
). However, for the lms
and
qml
approaches, it is not as straightforward.
The lms
and qml
approaches can (by default)
handle interaction effects between endogenous and exogenous (i.e.,
independent) variables, but they do not natively support interaction
effects between two endogenous variables. When an interaction effect
exists between two endogenous variables, the equations cannot easily be
written in “reduced” form, meaning that normal estimation procedures
won’t work.
The Solution
Despite these limitations, there is a workaround for both the
lms
and qml
approaches. Essentially, the model
can be split into two parts: one linear and one non-linear. You can
replace the covariance matrix used in the estimation of the non-linear
model with the model-implied covariance matrix from a linear model. This
allows you to treat an endogenous variable as if it were
exogenous—provided that it can be expressed in a linear model.
Example
Let’s consider the theory of planned behavior (TPB), where we wish to
estimate the quadratic effect of INT
on BEH
(INT:INT
). We can use the following model:
tpb <- '
# Outer Model (Based on Hagger et al., 2007)
ATT =~ att1 + att2 + att3 + att4 + att5
SN =~ sn1 + sn2
PBC =~ pbc1 + pbc2 + pbc3
INT =~ int1 + int2 + int3
BEH =~ b1 + b2
# Inner Model (Based on Steinmetz et al., 2011)
INT ~ ATT + SN + PBC
BEH ~ INT + PBC
BEH ~ INT:INT
'
Since INT
is an endogenous variable, its quadratic term
(i.e., an interaction effect with itself) would involve two endogenous
variables. As a result, we would normally not be able to estimate this
model using the lms
or qml
approaches.
However, we can split the model into two parts: one linear and one
non-linear.
While INT
is an endogenous variable, it can be expressed
in a linear model since it is not affected by any interaction terms:
tpb_linear <- 'INT ~ PBC + ATT + SN'
We can then remove this part from the original model, giving us:
tpb_nonlinear <- '
# Outer Model (Based on Hagger et al., 2007)
ATT =~ att1 + att2 + att3 + att4 + att5
SN =~ sn1 + sn2
PBC =~ pbc1 + pbc2 + pbc3
INT =~ int1 + int2 + int3
BEH =~ b1 + b2
# Inner Model (Based on Steinmetz et al., 2011)
BEH ~ INT + PBC
BEH ~ INT:INT
'
Now, we can estimate the non-linear model since INT
is
treated as an exogenous variable. However, this would not incorporate
the structural model for INT
. To address this, we can
instruct modsem
to replace the covariance matrix
(phi
) of (INT
, PBC
,
ATT
, SN
) with the model-implied covariance
matrix from the linear model while estimating both models
simultaneously. To achieve this, we use the cov.syntax
argument in modsem
:
est_lms <- modsem(tpb_nonlinear, data = TPB, cov.syntax = tpb_linear,
method = "lms")
#> Warning: It is recommended that you have at least 48 nodes for interaction
#> effects between endogenous variables in the lms approach 'nodes = 24'
summary(est_lms)
#>
#> modsem (version 1.0.4):
#> Estimator LMS
#> Optimization method EM-NLMINB
#> Number of observations 2000
#> Number of iterations 63
#> Loglikelihood -23780.86
#> Akaike (AIC) 47669.71
#> Bayesian (BIC) 47972.16
#>
#> Numerical Integration:
#> Points of integration (per dim) 24
#> Dimensions 1
#> Total points of integration 24
#>
#> Fit Measures for H0:
#> Loglikelihood -26393
#> Akaike (AIC) 52892.45
#> Bayesian (BIC) 53189.29
#> Chi-square 66.27
#> Degrees of Freedom (Chi-square) 82
#> P-value (Chi-square) 0.897
#> RMSEA 0.000
#>
#> Comparative fit to H0 (no interaction effect)
#> Loglikelihood change 2612.37
#> Difference test (D) 5224.73
#> Degrees of freedom (D) 1
#> P-value (D) 0.000
#>
#> R-Squared:
#> BEH 0.235
#> INT 0.364
#> R-Squared Null-Model (H0):
#> BEH 0.210
#> INT 0.367
#> R-Squared Change:
#> BEH 0.025
#> INT -0.002
#>
#> Parameter Estimates:
#> Coefficients unstandardized
#> Information expected
#> Standard errors standard
#>
#> Latent Variables:
#> Estimate Std.Error z.value P(>|z|)
#> INT =~
#> int1 1.000
#> int2 0.915 0.024 38.09 0.000
#> int3 0.807 0.019 43.02 0.000
#> ATT =~
#> att1 1.000
#> att2 0.878 0.015 56.99 0.000
#> att3 0.789 0.015 50.95 0.000
#> att4 0.695 0.016 43.22 0.000
#> att5 0.887 0.019 47.24 0.000
#> SN =~
#> sn1 1.000
#> sn2 0.888 0.026 34.20 0.000
#> PBC =~
#> pbc1 1.000
#> pbc2 0.913 0.018 50.55 0.000
#> pbc3 0.801 0.019 41.39 0.000
#> BEH =~
#> b1 1.000
#> b2 0.959 0.053 18.18 0.000
#>
#> Regressions:
#> Estimate Std.Error z.value P(>|z|)
#> BEH ~
#> INT 0.196 0.035 5.60 0.000
#> PBC 0.238 0.032 7.54 0.000
#> INT:INT 0.129 0.027 4.83 0.000
#> INT ~
#> PBC 0.219 0.044 5.03 0.000
#> ATT 0.210 0.044 4.82 0.000
#> SN 0.171 0.045 3.78 0.000
#>
#> Intercepts:
#> Estimate Std.Error z.value P(>|z|)
#> int1 1.007 0.029 34.22 0.000
#> int2 1.006 0.025 40.68 0.000
#> int3 0.999 0.024 41.61 0.000
#> att1 1.010 0.034 29.46 0.000
#> att2 1.003 0.029 34.64 0.000
#> att3 1.013 0.029 35.54 0.000
#> att4 0.996 0.024 42.12 0.000
#> att5 0.989 0.031 31.61 0.000
#> sn1 1.002 0.045 22.34 0.000
#> sn2 1.007 0.040 25.29 0.000
#> pbc1 0.994 0.041 24.36 0.000
#> pbc2 0.981 0.035 28.30 0.000
#> pbc3 0.988 0.035 28.32 0.000
#> b1 0.997 0.032 30.87 0.000
#> b2 1.015 0.027 37.81 0.000
#> BEH 0.000
#> INT 0.000
#> ATT 0.000
#> SN 0.000
#> PBC 0.000
#>
#> Covariances:
#> Estimate Std.Error z.value P(>|z|)
#> PBC ~~
#> ATT 0.673 0.047 14.34 0.000
#> SN 0.673 0.050 13.54 0.000
#> ATT ~~
#> SN 0.624 0.052 12.10 0.000
#>
#> Variances:
#> Estimate Std.Error z.value P(>|z|)
#> int1 0.161 0.014 11.65 0.000
#> int2 0.161 0.010 16.29 0.000
#> int3 0.170 0.010 16.18 0.000
#> att1 0.167 0.009 17.75 0.000
#> att2 0.150 0.008 17.78 0.000
#> att3 0.160 0.009 17.67 0.000
#> att4 0.162 0.008 20.08 0.000
#> att5 0.159 0.008 18.70 0.000
#> sn1 0.178 0.020 8.73 0.000
#> sn2 0.157 0.017 9.04 0.000
#> pbc1 0.145 0.011 12.72 0.000
#> pbc2 0.160 0.010 15.74 0.000
#> pbc3 0.154 0.009 17.12 0.000
#> b1 0.185 0.033 5.64 0.000
#> b2 0.136 0.033 4.16 0.000
#> BEH 0.475 0.044 10.75 0.000
#> PBC 0.956 0.056 16.99 0.000
#> ATT 0.993 0.057 17.36 0.000
#> SN 0.983 0.065 15.07 0.000
#> INT 0.481 0.029 16.63 0.000
est_qml <- modsem(tpb_nonlinear, data = TPB, cov.syntax = tpb_linear,
method = "qml")
summary(est_qml)
#>
#> modsem (version 1.0.4):
#> Estimator QML
#> Optimization method NLMINB
#> Number of observations 2000
#> Number of iterations 71
#> Loglikelihood -26360.52
#> Akaike (AIC) 52829.04
#> Bayesian (BIC) 53131.49
#>
#> Fit Measures for H0:
#> Loglikelihood -26393
#> Akaike (AIC) 52892.45
#> Bayesian (BIC) 53189.29
#> Chi-square 66.27
#> Degrees of Freedom (Chi-square) 82
#> P-value (Chi-square) 0.897
#> RMSEA 0.000
#>
#> Comparative fit to H0 (no interaction effect)
#> Loglikelihood change 32.70
#> Difference test (D) 65.41
#> Degrees of freedom (D) 1
#> P-value (D) 0.000
#>
#> R-Squared:
#> BEH 0.239
#> INT 0.370
#> R-Squared Null-Model (H0):
#> BEH 0.210
#> INT 0.367
#> R-Squared Change:
#> BEH 0.029
#> INT 0.003
#>
#> Parameter Estimates:
#> Coefficients unstandardized
#> Information observed
#> Standard errors standard
#>
#> Latent Variables:
#> Estimate Std.Error z.value P(>|z|)
#> INT =~
#> int1 1.000
#> int2 0.914 0.015 59.04 0.000
#> int3 0.807 0.015 55.65 0.000
#> ATT =~
#> att1 1.000
#> att2 0.878 0.012 71.56 0.000
#> att3 0.789 0.012 66.37 0.000
#> att4 0.695 0.011 61.00 0.000
#> att5 0.887 0.013 70.85 0.000
#> SN =~
#> sn1 1.000
#> sn2 0.888 0.017 52.62 0.000
#> PBC =~
#> pbc1 1.000
#> pbc2 0.913 0.013 69.38 0.000
#> pbc3 0.801 0.012 66.08 0.000
#> BEH =~
#> b1 1.000
#> b2 0.960 0.032 29.90 0.000
#>
#> Regressions:
#> Estimate Std.Error z.value P(>|z|)
#> BEH ~
#> INT 0.197 0.025 7.76 0.000
#> PBC 0.239 0.023 10.59 0.000
#> INT:INT 0.128 0.016 7.88 0.000
#> INT ~
#> PBC 0.222 0.030 7.51 0.000
#> ATT 0.213 0.026 8.17 0.000
#> SN 0.175 0.028 6.33 0.000
#>
#> Intercepts:
#> Estimate Std.Error z.value P(>|z|)
#> int1 1.014 0.022 46.96 0.000
#> int2 1.012 0.020 50.40 0.000
#> int3 1.005 0.018 54.80 0.000
#> att1 1.014 0.024 42.00 0.000
#> att2 1.007 0.021 46.96 0.000
#> att3 1.016 0.020 51.45 0.000
#> att4 0.999 0.018 55.65 0.000
#> att5 0.992 0.022 45.67 0.000
#> sn1 1.006 0.024 41.66 0.000
#> sn2 1.010 0.022 46.70 0.000
#> pbc1 0.998 0.024 42.41 0.000
#> pbc2 0.985 0.022 44.93 0.000
#> pbc3 0.991 0.020 50.45 0.000
#> b1 0.999 0.023 42.63 0.000
#> b2 1.017 0.022 46.24 0.000
#> BEH 0.000
#> INT 0.000
#> ATT 0.000
#> SN 0.000
#> PBC 0.000
#>
#> Covariances:
#> Estimate Std.Error z.value P(>|z|)
#> PBC ~~
#> ATT 0.678 0.029 23.45 0.000
#> SN 0.678 0.029 23.08 0.000
#> ATT ~~
#> SN 0.629 0.029 21.70 0.000
#>
#> Variances:
#> Estimate Std.Error z.value P(>|z|)
#> int1 0.158 0.009 18.22 0.000
#> int2 0.160 0.008 20.38 0.000
#> int3 0.168 0.007 23.63 0.000
#> att1 0.167 0.007 23.53 0.000
#> att2 0.150 0.006 24.71 0.000
#> att3 0.160 0.006 26.38 0.000
#> att4 0.162 0.006 27.65 0.000
#> att5 0.159 0.006 24.93 0.000
#> sn1 0.178 0.015 12.09 0.000
#> sn2 0.157 0.012 13.26 0.000
#> pbc1 0.145 0.008 18.44 0.000
#> pbc2 0.160 0.007 21.42 0.000
#> pbc3 0.154 0.006 23.80 0.000
#> b1 0.185 0.020 9.42 0.000
#> b2 0.135 0.018 7.60 0.000
#> BEH 0.475 0.024 19.74 0.000
#> PBC 0.962 0.036 27.04 0.000
#> ATT 0.998 0.037 26.93 0.000
#> SN 0.988 0.039 25.23 0.000
#> INT 0.488 0.020 24.59 0.000