General Linear Modeling Tutorial
When to use this tool
The General Linear Model (GLM) is a versatile statistical tool that helps you understand the relationships between one or more predictor variables and a continuous response variable. Whether you're analyzing the effects of different treatments, comparing group means, or exploring interactions between variables, GLM provides a flexible framework for uncovering insights in your data.
GLM allows you to:
- Build models using both categorical and continuous predictors
- Include interaction terms and higher order terms as well as nested terms
- Simplify your model with different reduction methods
- Interpret coefficients, p-values, and model fit statistics
GLM is especially useful in experimental design, quality improvement, and advanced regression analysis.
How is this different from Multiple Regression?
General Linear Modeling is an umbrella that includes Multiple Regression.
Multiple Regression involves using multiple predictor variables to predict a response variable. The response must be continuous and the predictors can be continuous or categorical.
General Linear Modeling expands this concept to include linear combinations of other factors including interactions between factors, higher order terms, and nested terms.
General Linear Modeling can produce a more accurate model because it accounts for more detailed terms.
On the other hand, Multiple Regression is a simpler type of model that is great for cases where the predictor variables have a linear relationship with the response.
How to use this tool in EngineRoom
Basic Example:
Our first dataset includes two continuous factors (Temperature and Pressure), a categorical factor (Raw Material Batch), and the output variable (Strength).
Steps:
- Open the General Linear Modeling tool onto the workspace by going to Analyze > Regression Analysis > General Linear Modeling.
- Click on the Data Source and drag on Temperature and Pressure onto the Continuous Variables Dropzone. Then drag on Raw Material Batch onto the Categorical Dropzone.
- Drag Strength onto the Response Variable.
4. Look at the first category on the Basic Setup Screen. This category deals with categorical variables.

First, you have the option here to select which type of encoding you would use. Dummy Coding, also known as One-Hot Encoding, will calculate the coefficients of the categorical levels based on a reference level you select. In Effect Coding, the coefficients of the categorical levels will be set according to the mean. We will leave this at Dummy Coding. Second, you have the option of setting whether a particular categorical variable is Random. Random variables are variables that are not selected specifically. In this case, our Raw Material Batch number is not something that we are specifically selecting as the Batches will come in randomly and will change in the future as well.
5. Set the Raw Material Batch to Random.
6. The next section has different Model Reduction Methods. In this case, we will leave the default of Backward Elimination.
7. Click Calculate to jump straight to the output.
Let's examine the output:
Conclusion Statement: this will list which factors are significant at the selected alpha level.
Model Summary: These statistics help determine whether the model generated is a good fit for your data.
Model Equation (Coded): An equation generated using the input standardization method. The default standardization method is Centering, or subtracting the mean. In the Advanced example, we will show how this is set. This equation helps compare coefficients, but shouldn't be used to predict the output.
Model Equation (Uncoded): An equation generated from your data. Use this model when using real values of your data to predict the output.
Analysis of Variance and Coefficients Tables: Additional information on the significance of your inputs in the model. The significant factors are highlighted in green.
Model Parameters: Informational table about some of the selected options.
In this case, you'll see that the Raw Material Batch was removed. This was due to multicolliarity where it matched too well with the other factors, and so was not providing any new information.
In a case where the Random Factor is not removed, you will get some slightly different tables including a Marginal Model Equation and a Random Effects Table.
Note: When a random effect is present, this tool uses the Mixed Effect Model method. When a random effect is not present, this tool users General Linear Modeling.
Advanced Example: Interactions and Higher Order Terms
This dataset contains Activity and Weight as input variables and Bone Density as a response variable.

1. Open the General Linear Modeling tool onto the workspace by going to Analyze > Regression Analysis > General Linear Modeling.
2. Click on the Data Source and drag on Activity variable and the Weight variable onto the Continuous Variables dropzone.
3. Click Continue.
4. Drag on the Bone Density variable onto the Response Variable dropzone.
5. We are going to leave the Basic Setup options at their defaults.
6. Click on Advanced Setup.
7. Click on the Edit button next to Interactions and Higher Order Terms. The Interactions and Higher Order Terms screen allows you to select which interactions or higher order terms you want to consider for the model. This adds additional complexity to your model, so be aware when adding many interactions or higher order terms.
8. Click on Weight on the left side.
9. Click on "Add Higher Order Terms >>" to add Weight * Weight to the model.
10. Click on Activity on the left side. Both Weight and Activity will now be selected.
11. Click "Add Interaction Terms >>". This will add Activity * Weight to the model.
Note: When you have multiple factors selected on the left, all combinations or higher order terms for each of those will be added to the model when you click on the relevant button.
Note 2: Adjusting the number in the dropdown allows you to add 3rd order terms and three-way interactions.
12. Click Calculate.
The output of this chart will show:
Conclusion Statement: This shows which terms in the model were significant. You'll notice that the Weight * Activity term was removed from the model.
Model Summary: These statistics help determine whether the model generated is a good fit for your data.
The others are very similar to the simple case, repeated here:
Model Equation (Coded): An equation generated using the input standardization method. The default standardization method is Centering, or subtracting the mean. In the Advanced example, we will show how this is set. This equation helps compare coefficients, but shouldn't be used to predict the output.
Model Equation (Uncoded): An equation generated from your data. Use this model when using real values of your data to predict the output.
Analysis of Variance and Coefficients Tables: Additional information on the significance of your inputs in the model.
Model Parameters: Informational table about some of the selected options.
Advanced Example: Nested Terms
This dataset contains four input terms: Brand, City, Cost, and Taxes. The response variable is Spend.

Nested Terms are used when the levels of one of the factors in your model is part or reliant upon the levels of another factor. An example would be workers on particular shifts, or classrooms in particular schools.
1. Open the General Linear Modeling tool onto the workspace by going to Analyze > Regression Analysis > General Linear Modeling.
2. Click on the Data Source then drag the variables Brand and City onto the categorical variables dropzone. Drag the variables Cost and Taxes onto the Continuous Variable dropzone.
3. Drag on the variable Spend onto the Response Variable dropzone.
4. Leave the Basic Setup options as they are, and click on "Advanced Setup."
5. On the Advanced Setup screen, click "Edit" for Nested Variables.
6. On the Nested Variables screen, select City is nested in Brand.
7. Click Calculate.
Note: You can combine these elements by going back to the Advanced Options screen and then adding Interactions or Higher Order terms.
The output is very similar to the previous two outputs.
Conclusion Statement: You can see that Brand, City, and Brand(City) are significant in the model.
Many of the tables are similar to the previous case:
Model Equation (Coded): An equation generated using the input standardization method. The default standardization method is Centering, or subtracting the mean. In the Advanced example, we will show how this is set. This equation helps compare coefficients, but shouldn't be used to predict the output.
Model Equation (Uncoded): An equation generated from your data. Use this model when using real values of your data to predict the output.
Analysis of Variance and Coefficients Tables: Additional information on the significance of your inputs in the model.
Model Parameters: Informational table about some of the selected options.
However, because we have categorical variables in our model we have some new tables:
Simultaneous Tests for Difference of Means (Tukey Adjustment) - these tables show the difference between different levels of a categorical variable and determine the significance.
Note also that there are two tabs at the top. Switching to the tab called Factorial Plots will show the relevant plots. You will see these if you have categorical variables.
Additional Options
Two other options not previously discussed:
Basic Setup Screen:
Model Reduction Method: Used to simplify the model by adding or removing factors based on their significance. Choose from Backward Elimination, Forward Selection, Stepwise, or No Reduction.
Advanced Setup Screen:
Input Standardization Method: Sets how you might be interested in comparing coefficients on the same scale. The options are Subtract Mean (Center), Divide by Standard Deviation, Subtract Mean and Divide by Standard Deviation, Code Min/Max to -1/+1, or None.
General Linear Modeling Tutorial Video
Coming Soon
Was this helpful?