EngineRoom

General Linear Modeling Tutorial

When to use this tool

The General Linear Model (GLM) is a versatile statistical tool that helps you understand the relationships between one or more predictor variables and a continuous response variable. Whether you're analyzing the effects of different treatments, comparing group means, or exploring interactions between variables, GLM provides a flexible framework for uncovering insights in your data.

GLM allows you to:

  • Build models using both categorical and continuous predictors
  • Include interaction terms and higher order terms as well as nested terms
  • Simplify your model with different reduction methods
  • Interpret coefficients, p-values, and model fit statistics

GLM is especially useful in experimental design, quality improvement, and advanced regression analysis.

How is this different from Multiple Regression?

General Linear Modeling is an umbrella that includes Multiple Regression.

Multiple Regression involves using multiple predictor variables to predict a response variable. The response must be continuous and the predictors can be continuous or categorical.

General Linear Modeling expands this concept to include linear combinations of other factors including interactions between factors, higher order terms, and nested terms.

General Linear Modeling can produce a more accurate model because it accounts for more detailed terms.

On the other hand, Multiple Regression is a simpler type of model that is great for cases where the predictor variables have a linear relationship with the response.

How to use this tool in EngineRoom

Basic Example:

Click to Download Data File

Our first dataset includes two continuous factors (Temperature and Pressure), a categorical factor (Raw Material Batch), and the output variable (Strength).

Spreadsheet view of the data for the Basic Example

Steps:

  1. Open the General Linear Modeling tool onto the workspace by going to Analyze > Regression Analysis > General Linear Modeling.
The first screen of the General Linear Modeling tool which shows directions to drag and drop continuous and categorical variables onto the study.
  1. Click on the Data Source and drag on Temperature and Pressure onto the Continuous Variables Dropzone. Then drag on Raw Material Batch onto the Categorical Dropzone.
Temperature and Pressure variables added to Continuous Variables and Raw Material Batch variable added to Categorical Variables
  1. Drag Strength onto the Response Variable.
Strength variable added to the Response dropzone. Slide shows Basic setup screen with categorical encoding, model reduction method, and signficance level.

4. Look at the first category on the Basic Setup Screen. This category deals with categorical variables.

Basic setup screen of General Linear Modeling

First, you have the option here to select which type of encoding you would use. Dummy Coding, also known as One-Hot Encoding, will calculate the coefficients of the categorical levels based on a reference level you select. In Effect Coding, the coefficients of the categorical levels will be set according to the mean. We will leave this at Dummy Coding. Second, you have the option of setting whether a particular categorical variable is Random. Random variables are variables that are not selected specifically. In this case, our Raw Material Batch number is not something that we are specifically selecting as the Batches will come in randomly and will change in the future as well.

5. Set the Raw Material Batch to Random.

Raw Material Batch has been switched to random

6. The next section has different Model Reduction Methods. In this case, we will leave the default of Backward Elimination.

7. Click Calculate to jump straight to the output.

Let's examine the output:

Top part of output of G L M tool. Includes tables and charts. Charts include Normal probability plot, standard residuals vs fitted, standard residuals vs observation, and a histogram of standard residuals.

Conclusion Statement: this will list which factors are significant at the selected alpha level.

Model Summary: These statistics help determine whether the model generated is a good fit for your data.

Model Equation (Coded): An equation generated using the input standardization method. The default standardization method is Centering, or subtracting the mean. In the Advanced example, we will show how this is set. This equation helps compare coefficients, but shouldn't be used to predict the output.

Model Equation (Uncoded): An equation generated from your data. Use this model when using real values of your data to predict the output.

Analysis of Variance and Coefficients Tables: Additional information on the significance of your inputs in the model. The significant factors are highlighted in green.

Model Parameters: Informational table about some of the selected options.

In this case, you'll see that the Raw Material Batch was removed. This was due to multicolliarity where it matched too well with the other factors, and so was not providing any new information.

In a case where the Random Factor is not removed, you will get some slightly different tables including a Marginal Model Equation and a Random Effects Table.

Note: When a random effect is present, this tool uses the Mixed Effect Model method. When a random effect is not present, this tool users General Linear Modeling.

Advanced Example: Interactions and Higher Order Terms

Click to Download Data File

This dataset contains Activity and Weight as input variables and Bone Density as a response variable.

Dataset for the Interactions example for G L M

1. Open the General Linear Modeling tool onto the workspace by going to Analyze > Regression Analysis > General Linear Modeling.

The first screen of the General Linear Modeling tool which shows directions to drag and drop continuous and categorical variables onto the study.

2. Click on the Data Source and drag on Activity variable and the Weight variable onto the Continuous Variables dropzone.

Activity and Weight variables are added to the G L M study

3. Click Continue.

4. Drag on the Bone Density variable onto the Response Variable dropzone.

Bone Density variable added to response dropzone.

5. We are going to leave the Basic Setup options at their defaults.

6. Click on Advanced Setup.

Advanced Setup screen showing options for Nested Variables, Interactions and Higher Order Terms, and Input Standardization Method. Each as an edit button next to it.

7. Click on the Edit button next to Interactions and Higher Order Terms. The Interactions and Higher Order Terms screen allows you to select which interactions or higher order terms you want to consider for the model. This adds additional complexity to your model, so be aware when adding many interactions or higher order terms.

Interaction and Higher Order Terms screen which shows a left column containing Activity and Weight, a center column containing two buttons for Add Higher Order Terms and Add Interaction Terms, and a third column for Included in Model which has Activity and Weight.

8. Click on Weight on the left side.

9. Click on "Add Higher Order Terms >>" to add Weight * Weight to the model.

Weight is highlighted in the left column, Add Higher Order Terms button is active, and Weight times Weight is in the third column as included in the model.

10. Click on Activity on the left side. Both Weight and Activity will now be selected.

11. Click "Add Interaction Terms >>". This will add Activity * Weight to the model.

Activity and Weight are highlighted in the first column, both buttons in the center are now active, and the right column contains both weight times weight and activity times weight

Note: When you have multiple factors selected on the left, all combinations or higher order terms for each of those will be added to the model when you click on the relevant button.

Note 2: Adjusting the number in the dropdown allows you to add 3rd order terms and three-way interactions.

12. Click Calculate.

The output of the interaction and higher order terms. The Conclusion says that Activity, Weight, and Weight * Weight were significant.

The output of this chart will show:

Conclusion Statement: This shows which terms in the model were significant. You'll notice that the Weight * Activity term was removed from the model.

Model Summary: These statistics help determine whether the model generated is a good fit for your data.

The others are very similar to the simple case, repeated here:

Model Equation (Coded): An equation generated using the input standardization method. The default standardization method is Centering, or subtracting the mean. In the Advanced example, we will show how this is set. This equation helps compare coefficients, but shouldn't be used to predict the output.

Model Equation (Uncoded): An equation generated from your data. Use this model when using real values of your data to predict the output.

Analysis of Variance and Coefficients Tables: Additional information on the significance of your inputs in the model.

Model Parameters: Informational table about some of the selected options.

The bottom of the output of the interactions and higher order terms example.

Advanced Example: Nested Terms

Click to Download Data File

This dataset contains four input terms: Brand, City, Cost, and Taxes. The response variable is Spend.

Dataset for Nested Example of G L M

Nested Terms are used when the levels of one of the factors in your model is part or reliant upon the levels of another factor. An example would be workers on particular shifts, or classrooms in particular schools.

1. Open the General Linear Modeling tool onto the workspace by going to Analyze > Regression Analysis > General Linear Modeling.

The first screen of the General Linear Modeling tool which shows directions to drag and drop continuous and categorical variables onto the study.

2. Click on the Data Source then drag the variables Brand and City onto the categorical variables dropzone. Drag the variables Cost and Taxes onto the Continuous Variable dropzone.

City and Brand variables added to Categorical Dropzone. Cost and Taxes added to Continous Dropzone.

3. Drag on the variable Spend onto the Response Variable dropzone.

Spend variable added to the response variable dropzone.

4. Leave the Basic Setup options as they are, and click on "Advanced Setup."

The Advanced Setup screen of G L M showing options for Nested Variables, Interactions and Higher Order Terms, and Input Standardization Methods.

5. On the Advanced Setup screen, click "Edit" for Nested Variables.

The Nested Variables setup screen showing a dropzone on the left then quote is nested in end quote and a second dropzone.

6. On the Nested Variables screen, select City is nested in Brand.

City is in the left dropdown menu, then quote is nested in end quote, then Brand in the right dropdown menu

7. Click Calculate.

Note: You can combine these elements by going back to the Advanced Options screen and then adding Interactions or Higher Order terms.

Output of G L M with nesting. It shows a long model equation coded with a term for each level of brand and each level and combination of brand and city.

The output is very similar to the previous two outputs.

Conclusion Statement: You can see that Brand, City, and Brand(City) are significant in the model.

Many of the tables are similar to the previous case:

Model Equation (Coded): An equation generated using the input standardization method. The default standardization method is Centering, or subtracting the mean. In the Advanced example, we will show how this is set. This equation helps compare coefficients, but shouldn't be used to predict the output.

Continuing the Output this includes the uncoded equation and the beginning of the ANOVA table.

Model Equation (Uncoded): An equation generated from your data. Use this model when using real values of your data to predict the output.

Analysis of Variance and Coefficients Tables: Additional information on the significance of your inputs in the model.

Continuing the output, this shows the ANOVA table and the Coefficients coded. The V I F values for Brand are red, showing some colinearity.

Model Parameters: Informational table about some of the selected options.

However, because we have categorical variables in our model we have some new tables:

Simultaneous Tests for Difference of Means (Tukey Adjustment) - these tables show the difference between different levels of a categorical variable and determine the significance.

Continuing the output, the screen shows the Simulataneous Tests for Difference of Means for Brand as well as the Start for the Brand nested in City term.
Continuing the Output, the Nested Terms for Brand in City is very long and includes the Large versus small cities for each brand
Continuing the output, the screen shows medium and small cities compared for each brand
Continuing the output, this screen shows the last few comparisons for Small cities vs other small cities for each brand, and also finally the Model Parameters

Note also that there are two tabs at the top. Switching to the tab called Factorial Plots will show the relevant plots. You will see these if you have categorical variables.

plots for Brand and Brand Nested in City showing how they vary across the levels. Brand B is the lowest and Brand D is the highest, while Small City Brand B is the lowest and Large City Brand D is the highest

Additional Options

Two other options not previously discussed:

Basic Setup Screen:

Model Reduction Method: Used to simplify the model by adding or removing factors based on their significance. Choose from Backward Elimination, Forward Selection, Stepwise, or No Reduction.

Dropdown showing the four options for Model Reduction Method

Advanced Setup Screen:

Input Standardization Method: Sets how you might be interested in comparing coefficients on the same scale. The options are Subtract Mean (Center), Divide by Standard Deviation, Subtract Mean and Divide by Standard Deviation, Code Min/Max to -1/+1, or None.

Dropdown showing the options for the input standarization method

General Linear Modeling Tutorial Video

Coming Soon

Was this helpful?