Global Explanations for Adaptive Gradient Boosting Models¶
Pega
2025-10-17
This notebook demonstrates how to analyze and visualize global explanations for Adaptive Gradient Boosting models in Pega Adaptive Decision Manager (ADM). Global explanations provide insights into which predictors have the most influence on model predictions and how different predictor values correlate with the model’s scores.
Important: The SHAP explanation datasets used in this notebook are available starting with Pega Infinity ‘25. In earlier versions, the explanation files are not written to the repository.
The explanation files contain SHAP (SHapley Additive exPlanations) contributions for a sample of model executions. This notebook shows how these contributions are aggregated to provide global explanations for Gradient Boosting models. In Pega Infinity ‘25, a Global Explanations report can also be generated directly from Prediction Studio.
Note on earlier versions: In versions prior to Infinity ‘25, Feature Importance (also known as Predictor Importance) is already available for Gradient Boosting models and uses SHAP values. In Infinity ‘25, we enhance this by providing detailed insights into the correlation between predictor values and the model’s score (for example, high income may correlate with a high contribution to the propensity score for a platinum credit card offer) rather than just the importances of predictors. See: https://docs.pega.com/bundle/platform/page/platform/decision-management/view-summarized-reports-adm.html
Aggregate data exported from infinity¶
Prerequisite: You have already exported explanation files from infinity.
parameters:
data_folder
: the folder which has the explanation filesmodel_name
:optional
- the model rule to check for explanations, if not passed will pick up any file in the folderfrom_date
:optional
- if not passed will be today - 7 daysto_date
:optional
- if not passed will be today
The aggregated data will be stored in the .tmp/aggregated_data
directory.
[2]:
from pdstools.explanations import Explanations
import datetime
import polars as pl
import logging
# logging.basicConfig(level=logging.INFO) # Uncomment to see progress for large files
explanations = Explanations(
# data_folder='../../data/explanations/', # Uncomment this line and provide folder location of exported explanations data
data_file="https://raw.githubusercontent.com/pegasystems/pega-datascientist-tools/master/data/explanations/AdaptiveBoostCT_20250328064604.parquet", # Remove this argument when running locally with exported data
model_name='AdaptiveBoostCT',
from_date=datetime.date(2024,3,28),
to_date=datetime.date(2025,3,28)
)
Simple plotting of contributions¶
These methods will help to plot the contributions for overall model or a specific context.
The first plot will show the top-n
predictors with their contributions. The remaining plots are for each predictor in the top-n
list. Numeric predictors values will be binned to a max of 10 bins, while the categorical predictors will show the top-k
categories with their contributions.
Explanations for overall model¶
Call explanations.plot.contributions()
without selecting any context from the interactive context picker. This will result in plots which aggregate the data over all contexts.
parameters:
top_n
: Number of top predictors to plot.top_k
: Number of top predictor values for symbolic predictors to plot.remaining
: IfTrue
, the remaining predictors will be plotted as a single bar.missing
: IfTrue
, the missing values will be plotted as a separate bar.descending
: IfTrue
, the predictors will be sorted in descending order of their contributions. i.e least contributing predictors will be plotted first.contribution_calculation
: Method to calculate contributions. Some options arecontribution
,contribution_abs
,contribution_weighted
. Default iscontribution
which is the average contributions to predictions.
[3]:
_, plots = explanations.plot.contributions(top_n=3, top_k=5, remaining=True)
No context selected, plotting overall contributions. Use explanations.filter.interative() to select a context.
Explanations for selected context¶
Call explanations.filter.interactive()
to display the interactive context picker. This allows you to select a specific context from the list of available contexts.
The context picker will help in filtering the data for very large list of contexts. Fine-tune your selection by using the comboboxes on the left side of the context picker. This will display the available contexts on the right, from which you can select specific context keys.
Run explanations.plot.contributions()
after selecting a context from the interactive context picker. This will plot the contributions for the selected context.
NOTE: Plots are only for a single context. i.e required for a context to be selected from the list.
[4]:
explanations.filter.interactive()
[5]:
context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)
No context selected, plotting overall contributions. Use explanations.filter.interative() to select a context.
Can also set the context manually by passing a dictionary with the context keys and values.
[6]:
explanations.filter.set_selected_context(
{"pyChannel": "PegaBatch",
"pyDirection": "E2E Test",
"pyGroup": "E2E Test",
"pyIssue": "Batch",
"pyName": "P2"
})
[7]:
context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)
Advaced Data Exploration¶
For more advanced data exploration you can directly look at the aggregate. These classes provide more flexibility in how the data is loaded and processed. Allowing you to inspect the data before plotting.
[8]:
aggregate = explanations.aggregate # load the aggregated data
Inspect data for overall model¶
Get the top_n
predictors and their contributions for the overall model
[9]:
df_overall = aggregate.get_predictor_contributions(top_n = 3, remaining=False)
df_overall
[9]:
partition | predictor_name | predictor_type | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max |
---|---|---|---|---|---|---|---|---|---|
str | str | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 |
"whole_model" | "pyName" | "SYMBOLIC" | -0.021191 | 0.021255 | -0.000096 | 0.000097 | 50000 | -0.044283 | 0.029203 |
"whole_model" | "Age" | "NUMERIC" | -0.011306 | 0.011738 | -0.000092 | 0.000095 | 50000 | -0.034704 | 0.023485 |
"whole_model" | "Occupation" | "SYMBOLIC" | -0.008904 | 0.010193 | -0.000017 | 0.000019 | 50000 | -0.032185 | 0.06411 |
We can inspect the most influential values (top_k
) of the predictors we picked
[10]:
top_n_predictors = df_overall.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
predictors=top_n_predictors,
top_k = 2,
remaining=False
)
[10]:
partition | predictor_name | predictor_type | bin_order | bin_contents | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max | sort_column | sort_value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str | str | str | i64 | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 | str | f64 |
"whole_model" | "Age" | "NUMERIC" | 0 | "MISSING" | -0.013647 | 0.015131 | -0.000503 | 0.000557 | 1842 | -0.034704 | 0.023485 | "bin_order" | 0.0 |
"whole_model" | "Age" | "NUMERIC" | 5 | "[43.000:48.000]" | -0.013118 | 0.013136 | -0.001264 | 0.001265 | 4816 | -0.026145 | 0.006043 | "bin_order" | 5.0 |
"whole_model" | "Occupation" | "SYMBOLIC" | 20 | "Communications engineer" | -0.01616 | 0.016168 | -0.000335 | 0.000335 | 1035 | -0.028147 | 0.001797 | "contribution" | 0.01616 |
"whole_model" | "Occupation" | "SYMBOLIC" | 43 | "Psychotherapist, child" | -0.016257 | 0.016257 | -0.000268 | 0.000268 | 825 | -0.032185 | -0.003855 | "contribution" | 0.016257 |
"whole_model" | "pyName" | "SYMBOLIC" | 15 | "P18" | -0.024462 | 0.024462 | -0.001208 | 0.001208 | 2469 | -0.03746 | -0.009315 | "contribution" | 0.024462 |
"whole_model" | "pyName" | "SYMBOLIC" | 1 | "P1" | -0.025354 | 0.025354 | -0.001307 | 0.001307 | 2577 | -0.044283 | -0.007185 | "contribution" | 0.025354 |
Inspect data by selected context¶
Let’s repeat the same again, but this time we will inspect a selected context, instead of the entire model.
[11]:
import random
context_info = random.choice(aggregate.get_unique_contexts_list())
print('Selected random context: \n')
for key, value in context_info.items():
print(f'{key}: {value}')
df_by_context = aggregate.get_predictor_contributions(
context=context_info,
top_n=3,
remaining=False)
df_by_context
Selected random context:
pyChannel: PegaBatch
pyDirection: E2E Test
pyGroup: E2E Test
pyIssue: Batch
pyName: P16
[11]:
partition | predictor_name | predictor_type | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max |
---|---|---|---|---|---|---|---|---|---|
str | str | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 |
"{"partition":{"pyChannel":"Peg… | "pyName" | "SYMBOLIC" | -0.021201 | 0.021221 | -0.001927 | 0.001929 | 2509 | -0.031489 | 0.004226 |
"{"partition":{"pyChannel":"Peg… | "Age" | "NUMERIC" | -0.010762 | 0.011339 | -0.000091 | 0.000093 | 2509 | -0.024252 | 0.014799 |
"{"partition":{"pyChannel":"Peg… | "Occupation" | "SYMBOLIC" | -0.008109 | 0.009097 | -0.000015 | 0.000017 | 2509 | -0.023141 | 0.036477 |
[12]:
top_n_predictors = df_by_context.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
predictors=top_n_predictors,
top_k=2,
context=context_info,
remaining=False)
[12]:
partition | predictor_name | predictor_type | bin_order | bin_contents | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max | sort_column | sort_value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str | str | str | i64 | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 | str | f64 |
"{"partition":{"pyChannel":"Peg… | "Age" | "NUMERIC" | 7 | "[53.000:58.000]" | -0.012958 | 0.01296 | -0.001255 | 0.001255 | 243 | -0.01965 | 0.000093 | "bin_order" | 7.0 |
"{"partition":{"pyChannel":"Peg… | "Age" | "NUMERIC" | 9 | "[64.000:71.000]" | -0.013094 | 0.013094 | -0.001263 | 0.001263 | 242 | -0.020841 | -0.000639 | "bin_order" | 9.0 |
"{"partition":{"pyChannel":"Peg… | "Occupation" | "SYMBOLIC" | 33 | "Psychotherapist, child" | -0.014704 | 0.014704 | -0.000258 | 0.000258 | 44 | -0.021855 | -0.011503 | "contribution" | 0.014704 |
"{"partition":{"pyChannel":"Peg… | "Occupation" | "SYMBOLIC" | 25 | "Communications engineer" | -0.016192 | 0.016192 | -0.00031 | 0.00031 | 48 | -0.023141 | -0.005661 | "contribution" | 0.016192 |
"{"partition":{"pyChannel":"Peg… | "pyName" | "SYMBOLIC" | 1 | "P16" | -0.021201 | 0.021221 | -0.021201 | 0.021221 | 2509 | -0.031489 | 0.004226 | "contribution" | 0.021201 |