Explain Predictions

Modified on Wed, 5 Feb at 11:32 AM

Description

Explain predictions analysis helps you to increase your understanding of the relationships between inputs and outputs of your models.

Application

This can help you finding errors in a model if unexpected relationships are met; you can use these findings to improve your model in an iterative process. It might also help you to find patterns you weren’t aware of so far and improve your understanding of the data behind the model. This step will NOT work if your model has missing data in it's training set, please ensure that you utilise the 'Remove Missing' step prior to model training, if your model has missing data.

How to use

When you first click on Explain Predictions you have to configure the manipulator:

Choose a Model on which the analysis should be performed. - This step will not work with stacked or chained model
Next you have to choose the output for which you want to run the analysis in the field Output to explain.

The step produces a bar plot which shows the importance of each input on the output which was selected above. This plot shows the importance of the input parameters of the selected design towards affecting the output parameter value. Green bars indicate that the input value is causing an increase of the output while red bars indicate that the input value is causing a decrease. Please note that Impact is measured as an arbitrary value for comparative analysis.

The analysis is performed on all training data points. That is, the Explain Predictions plot show the importance of the inputs on the output as an average value for the entire input design space.

For example, below, the size parameters have minor impact on the recyclability of the product, while the material and manufacturing process have the biggest impact.

More on this step

The Explain Predictions manipulator uses the SHAP algorithm. This algorithm is significantly different from the algorithms used for sensitivity analysis. The sensitivity analysis chooses random samples in the design space, evaluates them with the trained models and breaks down how much of the changes in the output(s) can be explained by the different inputs. Phrased different, how much do changes of the input variables cause changes in the outputs?

The SHAP algorithm takes a completely different approach. Instead of asking “How much do changes of my input cause changes of my output?”, the question here is “How much does my model prediction change if I use or don’t use this feature as input?” That is, in a way, for this analysis different models with all sorts of input combinations are trained and the overall impact of each input is figured out from these combinations. Below is an example to illustrate that.

Imagine a database in which you have information about age, gender, job and salary of people. You use the three first ones as input to make a prediction on the salary. To figure out the importance of the three inputs on the output “salary” the SHAP algorithm starts from the average value of the output in the training dataset (which is 50k $ in our example). Starting from that the SHAP algorithm evaluates how the average model prediction changes if certain variables are used as input (or aren’t used). In the image below, all connections are highlighted in red which are relevant to figure out the importance of the input feature “age”. The arrows always start at models which don’t use “age” as input and then compare how the model prediction changes if “age” is added as input. The overall importance would be a weighted average of all highlighted connections in the graph below.

For this example, you see that “age” is obviously a significant input and on average is decreasing the predicted salary. In the bar plots the input “age” would appear with a red bar.

The method used in this step is non-deterministic, which means that you might obtain different results each time you run it. Check this article to know more about this.