Global Explanations for Adaptive Gradient Boosting Models¶

Pega

2025-10-17

This notebook demonstrates how to analyze and visualize global explanations for Adaptive Gradient Boosting models in Pega Adaptive Decision Manager (ADM). Global explanations provide insights into which predictors have the most influence on model predictions and how different predictor values correlate with the model’s scores.

Important: The SHAP explanation datasets used in this notebook are available starting with Pega Infinity ‘25. In earlier versions, the explanation files are not written to the repository.

The explanation files contain SHAP (SHapley Additive exPlanations) contributions for a sample of model executions. This notebook shows how these contributions are aggregated to provide global explanations for Gradient Boosting models. In Pega Infinity ‘25, a Global Explanations report can also be generated directly from Prediction Studio.

Note on earlier versions: In versions prior to Infinity ‘25, Feature Importance (also known as Predictor Importance) is already available for Gradient Boosting models and uses SHAP values. In Infinity ‘25, we enhance this by providing detailed insights into the correlation between predictor values and the model’s score (for example, high income may correlate with a high contribution to the propensity score for a platinum credit card offer) rather than just the importances of predictors. See: https://docs.pega.com/bundle/platform/page/platform/decision-management/view-summarized-reports-adm.html

Aggregate data exported from infinity¶

Prerequisite: You have already exported explanation files from infinity.

parameters:

data_folder: the folder which has the explanation files
model_name: optional - the model rule to check for explanations, if not passed will pick up any file in the folder
from_date: optional - if not passed will be today - 7 days
to_date: optional - if not passed will be today

The aggregated data will be stored in the .tmp/aggregated_data directory.

[2]:

from pdstools.explanations import Explanations

import datetime
import polars as pl
import logging
# logging.basicConfig(level=logging.INFO) # Uncomment to see progress for large files


explanations = Explanations(
    # data_folder='../../data/explanations/', # Uncomment this line and provide folder location of exported explanations data
    data_file="https://raw.githubusercontent.com/pegasystems/pega-datascientist-tools/master/data/explanations/AdaptiveBoostCT_20250328064604.parquet", # Remove this argument when running locally with exported data
    model_name='AdaptiveBoostCT',
    from_date=datetime.date(2024,3,28),
    to_date=datetime.date(2025,3,28)
)

Simple plotting of contributions¶

These methods will help to plot the contributions for overall model or a specific context.

The first plot will show the top-n predictors with their contributions. The remaining plots are for each predictor in the top-n list. Numeric predictors values will be binned to a max of 10 bins, while the categorical predictors will show the top-k categories with their contributions.

Explanations for overall model¶

Call explanations.plot.contributions() without selecting any context from the interactive context picker. This will result in plots which aggregate the data over all contexts.

parameters:

top_n: Number of top predictors to plot.
top_k: Number of top predictor values for symbolic predictors to plot.
remaining: If True, the remaining predictors will be plotted as a single bar.
missing: If True, the missing values will be plotted as a separate bar.
descending: If True, the predictors will be sorted in descending order of their contributions. i.e least contributing predictors will be plotted first.
contribution_calculation: Method to calculate contributions. Some options are contribution, contribution_abs, contribution_weighted. Default is contribution which is the average contributions to predictions.

[3]:

_, plots = explanations.plot.contributions(top_n=3, top_k=5, remaining=True)

No context selected, plotting overall contributions. Use explanations.filter.interative() to select a context.

Explanations for selected context¶

Call explanations.filter.interactive() to display the interactive context picker. This allows you to select a specific context from the list of available contexts.

The context picker will help in filtering the data for very large list of contexts. Fine-tune your selection by using the comboboxes on the left side of the context picker. This will display the available contexts on the right, from which you can select specific context keys.

Run explanations.plot.contributions() after selecting a context from the interactive context picker. This will plot the contributions for the selected context.

NOTE: Plots are only for a single context. i.e required for a context to be selected from the list.

[4]:

explanations.filter.interactive()

[5]:

context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)

No context selected, plotting overall contributions. Use explanations.filter.interative() to select a context.

Can also set the context manually by passing a dictionary with the context keys and values.

[6]:

explanations.filter.set_selected_context(
    {"pyChannel": "PegaBatch",
    "pyDirection": "E2E Test",
    "pyGroup": "E2E Test",
    "pyIssue": "Batch",
    "pyName": "P2"
})

[7]:

context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)

Advaced Data Exploration¶

For more advanced data exploration you can directly look at the aggregate. These classes provide more flexibility in how the data is loaded and processed. Allowing you to inspect the data before plotting.

[8]:

aggregate = explanations.aggregate # load the aggregated data

Inspect data for overall model¶

Get the top_n predictors and their contributions for the overall model

[9]:

df_overall = aggregate.get_predictor_contributions(top_n = 3, remaining=False)
df_overall

[9]:

shape: (3, 10)

partition	predictor_name	predictor_type	contribution	contribution_abs	contribution_weighted	contribution_weighted_abs	frequency	contribution_min	contribution_max
str	str	str	f64	f64	f64	f64	i64	f64	f64
"whole_model"	"pyName"	"SYMBOLIC"	-0.021191	0.021255	-0.000096	0.000097	50000	-0.044283	0.029203
"whole_model"	"Age"	"NUMERIC"	-0.011306	0.011738	-0.000092	0.000095	50000	-0.034704	0.023485
"whole_model"	"Occupation"	"SYMBOLIC"	-0.008904	0.010193	-0.000017	0.000019	50000	-0.032185	0.06411

We can inspect the most influential values (top_k) of the predictors we picked

[10]:

top_n_predictors = df_overall.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
    predictors=top_n_predictors,
    top_k = 2,
    remaining=False
)

[10]:

shape: (6, 14)

partition	predictor_name	predictor_type	bin_order	bin_contents	contribution	contribution_abs	contribution_weighted	contribution_weighted_abs	frequency	contribution_min	contribution_max	sort_column	sort_value
str	str	str	i64	str	f64	f64	f64	f64	i64	f64	f64	str	f64
"whole_model"	"Age"	"NUMERIC"	0	"MISSING"	-0.013647	0.015131	-0.000503	0.000557	1842	-0.034704	0.023485	"bin_order"	0.0
"whole_model"	"Age"	"NUMERIC"	5	"[43.000:48.000]"	-0.013118	0.013136	-0.001264	0.001265	4816	-0.026145	0.006043	"bin_order"	5.0
"whole_model"	"Occupation"	"SYMBOLIC"	21	"Communications engineer"	-0.01616	0.016168	-0.000335	0.000335	1035	-0.028147	0.001797	"contribution"	0.01616
"whole_model"	"Occupation"	"SYMBOLIC"	42	"Psychotherapist, child"	-0.016257	0.016257	-0.000268	0.000268	825	-0.032185	-0.003855	"contribution"	0.016257
"whole_model"	"pyName"	"SYMBOLIC"	15	"P18"	-0.024462	0.024462	-0.001208	0.001208	2469	-0.03746	-0.009315	"contribution"	0.024462
"whole_model"	"pyName"	"SYMBOLIC"	1	"P1"	-0.025354	0.025354	-0.001307	0.001307	2577	-0.044283	-0.007185	"contribution"	0.025354

Inspect data by selected context¶

Let’s repeat the same again, but this time we will inspect a selected context, instead of the entire model.

[11]:

import random
context_info = random.choice(aggregate.get_unique_contexts_list())
print('Selected random context: \n')
for key, value in context_info.items():
    print(f'{key}: {value}')
df_by_context = aggregate.get_predictor_contributions(
    context=context_info,
    top_n=3,
    remaining=False)
df_by_context

Selected random context:

pyChannel: PegaBatch
pyDirection: E2E Test
pyGroup: E2E Test
pyIssue: Batch
pyName: P6

[11]:

shape: (3, 10)

partition	predictor_name	predictor_type	contribution	contribution_abs	contribution_weighted	contribution_weighted_abs	frequency	contribution_min	contribution_max
str	str	str	f64	f64	f64	f64	i64	f64	f64
"{"partition":{"pyChannel":"Peg…	"pyName"	"SYMBOLIC"	-0.021682	0.021685	-0.001971	0.001971	2462	-0.033906	0.001156
"{"partition":{"pyChannel":"Peg…	"Age"	"NUMERIC"	-0.010996	0.011617	-0.000093	0.000096	2462	-0.024542	0.015903
"{"partition":{"pyChannel":"Peg…	"Occupation"	"SYMBOLIC"	-0.008112	0.009478	-0.000015	0.000018	2462	-0.02269	0.042926

[12]:

top_n_predictors = df_by_context.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
    predictors=top_n_predictors,
    top_k=2,
    context=context_info,
    remaining=False)

[12]:

shape: (5, 14)

partition	predictor_name	predictor_type	bin_order	bin_contents	contribution	contribution_abs	contribution_weighted	contribution_weighted_abs	frequency	contribution_min	contribution_max	sort_column	sort_value
str	str	str	i64	str	f64	f64	f64	f64	i64	f64	f64	str	f64
"{"partition":{"pyChannel":"Peg…	"Age"	"NUMERIC"	4	"[38.000:43.000]"	-0.014996	0.015073	-0.00145	0.001457	238	-0.024542	0.002281	"bin_order"	4.0
"{"partition":{"pyChannel":"Peg…	"Age"	"NUMERIC"	5	"[43.000:47.000]"	-0.014158	0.014158	-0.001369	0.001369	238	-0.023704	-0.004716	"bin_order"	5.0
"{"partition":{"pyChannel":"Peg…	"Occupation"	"SYMBOLIC"	40	"Psychotherapist, child"	-0.014471	0.014471	-0.000229	0.000229	39	-0.02103	-0.006361	"contribution"	0.014471
"{"partition":{"pyChannel":"Peg…	"Occupation"	"SYMBOLIC"	24	"Communications engineer"	-0.016087	0.016087	-0.000314	0.000314	48	-0.02269	-0.007545	"contribution"	0.016087
"{"partition":{"pyChannel":"Peg…	"pyName"	"SYMBOLIC"	1	"P6"	-0.021682	0.021685	-0.021682	0.021685	2462	-0.033906	0.001156	"contribution"	0.021682