interaction effects between endogenous variables • modsem

library(modsem)
#> This is modsem (1.0.15). Please report any bugs!

The Problem

Interaction effects between two endogenous (i.e., dependent) variables work as you would expect for the product indicator methods ("dblcent", "rca", "ca", "uca"). However, for the lms and qml approaches, it is not as straightforward.

The lms and qml approaches can (by default) handle interaction effects between endogenous and exogenous (i.e., independent) variables, but they do not natively support interaction effects between two endogenous variables. When an interaction effect exists between two endogenous variables, the equations cannot easily be written in “reduced” form, meaning that normal estimation procedures won’t work.

The Solution

Despite these limitations, there is a workaround for both the lms and qml approaches. Essentially, the model can be split into two parts: one linear and one non-linear. You can replace the covariance matrix used in the estimation of the non-linear model with the model-implied covariance matrix from a linear model. This allows you to treat an endogenous variable as if it were exogenous—provided that it can be expressed in a linear model.

Example

Let’s consider the theory of planned behavior (TPB), where we wish to estimate the quadratic effect of INT on BEH (INT:INT). We can use the following model:

tpb <- '
# Outer Model (Based on Hagger et al., 2007)
  ATT =~ att1 + att2 + att3 + att4 + att5
  SN =~ sn1 + sn2
  PBC =~ pbc1 + pbc2 + pbc3
  INT =~ int1 + int2 + int3
  BEH =~ b1 + b2

# Inner Model (Based on Steinmetz et al., 2011)
  INT ~ ATT + SN + PBC
  BEH ~ INT + PBC
  BEH ~ INT:INT
'

Since INT is an endogenous variable, its quadratic term (i.e., an interaction effect with itself) would involve two endogenous variables. As a result, we would normally not be able to estimate this model using the lms or qml approaches. However, we can split the model into two parts: one linear and one non-linear.

While INT is an endogenous variable, it can be expressed in a linear model since it is not affected by any interaction terms:

tpb_linear <- 'INT ~ PBC + ATT + SN'

We can then remove this part from the original model, giving us:

tpb_nonlinear <- '
# Outer Model (Based on Hagger et al., 2007)
  ATT =~ att1 + att2 + att3 + att4 + att5
  SN =~ sn1 + sn2
  PBC =~ pbc1 + pbc2 + pbc3
  INT =~ int1 + int2 + int3
  BEH =~ b1 + b2

# Inner Model (Based on Steinmetz et al., 2011)
  BEH ~ INT + PBC
  BEH ~ INT:INT
'

Now, we can estimate the non-linear model since INT is treated as an exogenous variable. However, this would not incorporate the structural model for INT. To address this, we can instruct modsem to replace the covariance matrix (phi) of (INT, PBC, ATT, SN) with the model-implied covariance matrix from the linear model while estimating both models simultaneously. To achieve this, we use the cov.syntax argument in modsem:

est_lms <- modsem(tpb_nonlinear, data = TPB, cov.syntax = tpb_linear,
                  method = "lms")
#> Warning: It is recommended that you have at least 32 nodes for interaction
#> effects between endogenous variables in the lms approach 'nodes = 24'
summary(est_lms)
#> 
#> modsem (1.0.15) ended normally after 32 iterations
#> 
#>   Estimator                                        LMS
#>   Optimization method                       EMA-NLMINB
#>   Number of model parameters                        54
#> 
#>   Number of observations                          2000
#> 
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -26360.48
#>   Akaike (AIC)                                52828.95
#>   Bayesian (BIC)                              53131.40
#>  
#> Numerical Integration:
#>   Points of integration (per dim)                   24
#>   Dimensions                                         1
#>   Total points of integration                       24
#> 
#> Fit Measures for Baseline Model (H0):
#>                                               Standard
#>   Chi-square                                     66.27
#>   Degrees of Freedom (Chi-square)                   82
#>   P-value (Chi-square)                           0.897
#>   RMSEA                                          0.000
#>                                                       
#>   Loglikelihood                              -26393.22
#>   Akaike (AIC)                                52892.45
#>   Bayesian (BIC)                              53189.29
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                           32.75
#>   Difference test (D)                            65.49
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   INT                                            0.370
#>   BEH                                            0.239
#> R-Squared Baseline Model (H0):
#>   INT                                            0.367
#>   BEH                                            0.210
#> R-Squared Change (H1 - H0):
#>   INT                                            0.003
#>   BEH                                            0.029
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                                 observed
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT =~        
#>     att1            1.000                             
#>     att2            0.878      0.012   71.560    0.000
#>     att3            0.789      0.012   66.372    0.000
#>     att4            0.695      0.011   60.996    0.000
#>     att5            0.887      0.013   70.849    0.000
#>   SN =~         
#>     sn1             1.000                             
#>     sn2             0.888      0.017   52.612    0.000
#>   PBC =~        
#>     pbc1            1.000                             
#>     pbc2            0.913      0.013   69.377    0.000
#>     pbc3            0.801      0.012   66.074    0.000
#>   INT =~        
#>     int1            1.000                             
#>     int2            0.914      0.015   58.976    0.000
#>     int3            0.807      0.015   55.600    0.000
#>   BEH =~        
#>     b1              1.000                             
#>     b2              0.959      0.032   29.920    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   INT ~         
#>     ATT             0.213      0.026    8.160    0.000
#>     SN              0.175      0.028    6.320    0.000
#>     PBC             0.222      0.030    7.483    0.000
#>   BEH ~         
#>     PBC             0.239      0.023   10.582    0.000
#>     INT             0.197      0.025    7.761    0.000
#>     INT:INT         0.128      0.016    7.894    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .int1            1.014      0.022   46.940    0.000
#>    .int2            1.012      0.020   50.381    0.000
#>    .int3            1.005      0.018   54.781    0.000
#>    .att1            1.014      0.024   41.990    0.000
#>    .att2            1.007      0.021   46.950    0.000
#>    .att3            1.016      0.020   51.439    0.000
#>    .att4            0.999      0.018   55.636    0.000
#>    .att5            0.992      0.022   45.655    0.000
#>    .sn1             1.005      0.024   41.649    0.000
#>    .sn2             1.010      0.022   46.696    0.000
#>    .pbc1            0.997      0.024   42.400    0.000
#>    .pbc2            0.985      0.022   44.919    0.000
#>    .pbc3            0.991      0.020   50.440    0.000
#>    .b1              0.998      0.023   42.618    0.000
#>    .b2              1.017      0.022   46.237    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT ~~        
#>     SN              0.629      0.029   21.699    0.000
#>   PBC ~~        
#>     ATT             0.678      0.029   23.451    0.000
#>     SN              0.678      0.029   23.087    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .int1            0.158      0.009   18.228    0.000
#>    .int2            0.160      0.008   20.372    0.000
#>    .int3            0.168      0.007   23.631    0.000
#>    .att1            0.167      0.007   23.530    0.000
#>    .att2            0.150      0.006   24.713    0.000
#>    .att3            0.160      0.006   26.381    0.000
#>    .att4            0.163      0.006   27.644    0.000
#>    .att5            0.159      0.006   24.925    0.000
#>    .sn1             0.178      0.015   12.090    0.000
#>    .sn2             0.157      0.012   13.258    0.000
#>    .pbc1            0.145      0.008   18.443    0.000
#>    .pbc2            0.160      0.007   21.423    0.000
#>    .pbc3            0.154      0.006   23.798    0.000
#>    .b1              0.185      0.020    9.411    0.000
#>    .b2              0.136      0.018    7.611    0.000
#>     ATT             0.998      0.037   26.935    0.000
#>     SN              0.987      0.039   25.229    0.000
#>     PBC             0.961      0.036   27.047    0.000
#>    .INT             0.488      0.020   24.566    0.000
#>    .BEH             0.475      0.024   19.739    0.000

est_qml <- modsem(tpb_nonlinear, data = TPB, cov.syntax = tpb_linear,
                  method = "qml")
summary(est_qml)
#> 
#> modsem (1.0.15) ended normally after 85 iterations
#> 
#>   Estimator                                        QML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        54
#> 
#>   Number of observations                          2000
#> 
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -26360.52
#>   Akaike (AIC)                                52829.04
#>   Bayesian (BIC)                              53131.49
#>  
#> Fit Measures for Baseline Model (H0):
#>                                               Standard
#>   Chi-square                                     66.27
#>   Degrees of Freedom (Chi-square)                   82
#>   P-value (Chi-square)                           0.897
#>   RMSEA                                          0.000
#>                                                       
#>   Loglikelihood                              -26393.22
#>   Akaike (AIC)                                52892.45
#>   Bayesian (BIC)                              53189.29
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                           32.70
#>   Difference test (D)                            65.41
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   INT                                            0.370
#>   BEH                                            0.239
#> R-Squared Baseline Model (H0):
#>   INT                                            0.367
#>   BEH                                            0.210
#> R-Squared Change (H1 - H0):
#>   INT                                            0.003
#>   BEH                                            0.029
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                                 observed
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT =~        
#>     att1            1.000                             
#>     att2            0.878      0.012   71.561    0.000
#>     att3            0.789      0.012   66.371    0.000
#>     att4            0.695      0.011   60.998    0.000
#>     att5            0.887      0.013   70.850    0.000
#>   SN =~         
#>     sn1             1.000                             
#>     sn2             0.888      0.017   52.612    0.000
#>   PBC =~        
#>     pbc1            1.000                             
#>     pbc2            0.913      0.013   69.380    0.000
#>     pbc3            0.801      0.012   66.077    0.000
#>   INT =~        
#>     int1            1.000                             
#>     int2            0.914      0.015   59.035    0.000
#>     int3            0.807      0.015   55.653    0.000
#>   BEH =~        
#>     b1              1.000                             
#>     b2              0.960      0.032   29.905    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   INT ~         
#>     ATT             0.213      0.026    8.173    0.000
#>     SN              0.175      0.028    6.336    0.000
#>     PBC             0.222      0.030    7.506    0.000
#>   BEH ~         
#>     PBC             0.239      0.023   10.586    0.000
#>     INT             0.197      0.025    7.755    0.000
#>     INT:INT         0.128      0.016    7.883    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .int1            1.014      0.022   46.954    0.000
#>    .int2            1.012      0.020   50.396    0.000
#>    .int3            1.005      0.018   54.795    0.000
#>    .att1            1.014      0.024   41.999    0.000
#>    .att2            1.007      0.021   46.959    0.000
#>    .att3            1.016      0.020   51.447    0.000
#>    .att4            0.999      0.018   55.647    0.000
#>    .att5            0.992      0.022   45.664    0.000
#>    .sn1             1.005      0.024   41.655    0.000
#>    .sn2             1.010      0.022   46.703    0.000
#>    .pbc1            0.997      0.024   42.407    0.000
#>    .pbc2            0.985      0.022   44.926    0.000
#>    .pbc3            0.991      0.020   50.447    0.000
#>    .b1              0.999      0.023   42.627    0.000
#>    .b2              1.017      0.022   46.238    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT ~~        
#>     SN              0.629      0.029   21.703    0.000
#>   PBC ~~        
#>     ATT             0.678      0.029   23.455    0.000
#>     SN              0.678      0.029   23.090    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .int1            0.158      0.009   18.219    0.000
#>    .int2            0.160      0.008   20.377    0.000
#>    .int3            0.168      0.007   23.630    0.000
#>    .att1            0.167      0.007   23.530    0.000
#>    .att2            0.150      0.006   24.712    0.000
#>    .att3            0.160      0.006   26.379    0.000
#>    .att4            0.162      0.006   27.645    0.000
#>    .att5            0.159      0.006   24.925    0.000
#>    .sn1             0.178      0.015   12.091    0.000
#>    .sn2             0.157      0.012   13.258    0.000
#>    .pbc1            0.145      0.008   18.444    0.000
#>    .pbc2            0.160      0.007   21.424    0.000
#>    .pbc3            0.154      0.006   23.798    0.000
#>    .b1              0.186      0.020    9.420    0.000
#>    .b2              0.135      0.018    7.595    0.000
#>     ATT             0.998      0.037   26.942    0.000
#>     SN              0.987      0.039   25.233    0.000
#>     PBC             0.961      0.036   27.050    0.000
#>    .INT             0.488      0.020   24.585    0.000
#>    .BEH             0.475      0.024   19.738    0.000

It is also possible to make modsem attempt to split up the model syntax automatically using the auto.split.syntax argument:

est_lms <- modsem(tpb, data = TPB, method = "lms", auto.split.syntax = TRUE)
#> Warning: It is recommended that you have at least 32 nodes for interaction
#> effects between endogenous variables in the lms approach 'nodes = 24'
summary(est_lms)
#> 
#> modsem (1.0.15) ended normally after 33 iterations
#> 
#>   Estimator                                        LMS
#>   Optimization method                       EMA-NLMINB
#>   Number of model parameters                        54
#> 
#>   Number of observations                          2000
#> 
#> Loglikelihood and Information Criteria:
#>   Loglikelihood                              -26360.48
#>   Akaike (AIC)                                52828.95
#>   Bayesian (BIC)                              53131.40
#>  
#> Numerical Integration:
#>   Points of integration (per dim)                   24
#>   Dimensions                                         1
#>   Total points of integration                       24
#> 
#> Fit Measures for Baseline Model (H0):
#>                                               Standard
#>   Chi-square                                     66.27
#>   Degrees of Freedom (Chi-square)                   82
#>   P-value (Chi-square)                           0.897
#>   RMSEA                                          0.000
#>                                                       
#>   Loglikelihood                              -26393.22
#>   Akaike (AIC)                                52892.45
#>   Bayesian (BIC)                              53189.29
#>  
#> Comparative Fit to H0 (LRT test):
#>   Loglikelihood change                           32.75
#>   Difference test (D)                            65.49
#>   Degrees of freedom (D)                             1
#>   P-value (D)                                    0.000
#>  
#> R-Squared Interaction Model (H1):
#>   INT                                            0.370
#>   BEH                                            0.239
#> R-Squared Baseline Model (H0):
#>   INT                                            0.367
#>   BEH                                            0.210
#> R-Squared Change (H1 - H0):
#>   INT                                            0.003
#>   BEH                                            0.029
#> 
#> Parameter Estimates:
#>   Coefficients                          unstandardized
#>   Information                                 observed
#>   Standard errors                             standard
#>  
#> Latent Variables:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT =~        
#>     att1            1.000                             
#>     att2            0.878      0.012   71.562    0.000
#>     att3            0.789      0.012   66.372    0.000
#>     att4            0.695      0.011   60.999    0.000
#>     att5            0.887      0.013   70.850    0.000
#>   SN =~         
#>     sn1             1.000                             
#>     sn2             0.888      0.017   52.613    0.000
#>   PBC =~        
#>     pbc1            1.000                             
#>     pbc2            0.913      0.013   69.378    0.000
#>     pbc3            0.801      0.012   66.076    0.000
#>   INT =~        
#>     int1            1.000                             
#>     int2            0.914      0.015   58.976    0.000
#>     int3            0.807      0.015   55.599    0.000
#>   BEH =~        
#>     b1              1.000                             
#>     b2              0.959      0.032   29.921    0.000
#> 
#> Regressions:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   INT ~         
#>     ATT             0.213      0.026    8.160    0.000
#>     SN              0.175      0.028    6.319    0.000
#>     PBC             0.222      0.030    7.483    0.000
#>   BEH ~         
#>     PBC             0.239      0.023   10.583    0.000
#>     INT             0.197      0.025    7.761    0.000
#>     INT:INT         0.128      0.016    7.895    0.000
#> 
#> Intercepts:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .int1            1.014      0.022   46.939    0.000
#>    .int2            1.012      0.020   50.380    0.000
#>    .int3            1.005      0.018   54.780    0.000
#>    .att1            1.014      0.024   41.990    0.000
#>    .att2            1.007      0.021   46.950    0.000
#>    .att3            1.016      0.020   51.438    0.000
#>    .att4            0.999      0.018   55.637    0.000
#>    .att5            0.992      0.022   45.655    0.000
#>    .sn1             1.005      0.024   41.648    0.000
#>    .sn2             1.010      0.022   46.695    0.000
#>    .pbc1            0.997      0.024   42.400    0.000
#>    .pbc2            0.985      0.022   44.919    0.000
#>    .pbc3            0.991      0.020   50.439    0.000
#>    .b1              0.998      0.023   42.620    0.000
#>    .b2              1.017      0.022   46.238    0.000
#> 
#> Covariances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>   ATT ~~        
#>     SN              0.629      0.029   21.699    0.000
#>     PBC             0.678      0.029   23.451    0.000
#>   SN ~~         
#>     PBC             0.678      0.029   23.087    0.000
#> 
#> Variances:
#>                  Estimate  Std.Error  z.value  P(>|z|)
#>    .int1            0.158      0.009   18.228    0.000
#>    .int2            0.160      0.008   20.371    0.000
#>    .int3            0.168      0.007   23.631    0.000
#>    .att1            0.167      0.007   23.530    0.000
#>    .att2            0.150      0.006   24.713    0.000
#>    .att3            0.160      0.006   26.380    0.000
#>    .att4            0.162      0.006   27.646    0.000
#>    .att5            0.159      0.006   24.925    0.000
#>    .sn1             0.178      0.015   12.089    0.000
#>    .sn2             0.157      0.012   13.259    0.000
#>    .pbc1            0.145      0.008   18.444    0.000
#>    .pbc2            0.160      0.007   21.424    0.000
#>    .pbc3            0.154      0.006   23.798    0.000
#>    .b1              0.185      0.020    9.413    0.000
#>    .b2              0.136      0.018    7.610    0.000
#>     ATT             0.998      0.037   26.935    0.000
#>     SN              0.987      0.039   25.228    0.000
#>     PBC             0.961      0.036   27.046    0.000
#>    .INT             0.488      0.020   24.566    0.000
#>    .BEH             0.475      0.024   19.741    0.000