📦 Optional dependencies
This article uses features from the pdstools
explanationsextra. Install with your favorite package manager, e.g.pip install "pdstools[explanations]".
Global Explanations for Adaptive Gradient Boosting Models¶
This notebook demonstrates how to analyze and visualize global explanations for Adaptive Gradient Boosting models in Pega Adaptive Decision Manager (ADM). Global explanations provide insights into which predictors have the most influence on model predictions and how different predictor values correlate with the model’s scores.
Important: The SHAP explanation datasets used in this notebook are available starting with Pega Infinity ‘25. In earlier versions, the explanation files are not written to the repository.
The explanation files contain SHAP (SHapley Additive exPlanations) contributions for a sample of model executions. This notebook shows how these contributions are aggregated to provide global explanations for Gradient Boosting models. In Pega Infinity ‘25, a Global Explanations report can also be generated directly from Prediction Studio.
Note on earlier versions: In versions prior to Infinity ‘25, Feature Importance (also known as Predictor Importance) is already available for Gradient Boosting models and uses SHAP values. In Infinity ‘25, we enhance this by providing detailed insights into the correlation between predictor values and the model’s score (for example, high income may correlate with a high contribution to the propensity score for a platinum credit card offer) rather than just the importances of predictors. See: https://docs.pega.com/bundle/platform/page/platform/decision-management/view-summarized-reports-adm.html
Aggregate data exported from Infinity¶
Prerequisite: You have already exported explanation files from Infinity.
Parameters:
data_folder: the folder that contains the explanation filesmodel_name:optional- the model rule to check for explanations; if not passed, will pick up any file in the folderfrom_date:optional- if not passed, will be today - 7 daysto_date:optional- if not passed, will be today
The aggregated data will be stored in the .tmp/aggregated_data directory.
[2]:
from pdstools.explanations import Explanations
import datetime
import polars as pl
import logging
# logging.basicConfig(level=logging.INFO) # Uncomment to see progress for large files
explanations = Explanations.from_local_directory(
# data_folder='../../data/explanations/', # Uncomment this line and provide folder location of exported explanations data
data_file="https://raw.githubusercontent.com/pegasystems/pega-datascientist-tools/master/data/explanations/AdaptiveBoostCT_20250328064604.parquet", # Remove this argument when running locally with exported data
model_name='AdaptiveBoostCT',
from_date=datetime.date(2024,3,28),
to_date=datetime.date(2025,3,28)
)
Plotting contributions¶
The contributions() method plots the top-n most influential predictors and, for each of them, shows how individual predictor values contribute to the model score. Numeric values are binned (max 10 bins), symbolic predictors show the top-k categories.
Results can be shown for the model overall or drilled down into the action hierarchy (direction, channel, issue, group, action name), referred to as “context” in the API.
Overall model¶
Calling contributions() without selecting a level in the action hierarchy aggregates over all actions.
Parameters:
top_n: Number of top predictors to plot.top_k: Number of top predictor values for symbolic predictors to plot.remaining: IfTrue, the remaining predictors will be plotted as a single bar.missing: IfTrue, the missing values will be plotted as a separate bar.descending: IfTrue, the predictors will be sorted in descending order of their contributions, i.e. least contributing predictors will be plotted first.contribution_calculation: Method to calculate contributions. Options includecontribution,contribution_abs,contribution_weighted. Default iscontribution(average contributions to predictions).
[3]:
_, plots = explanations.plot.contributions(top_n=3, top_k=5, remaining=True)
By level in the action hierarchy¶
Use filter.interactive() to display a picker that lets you drill down into the action hierarchy. Use the comboboxes on the left to narrow down by direction, channel, issue, group, or action name, then select one on the right.
After making a selection, call contributions() again to plot contributions for that selection only.
[4]:
explanations.filter.interactive()
[5]:
context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)
You can also select a level in the action hierarchy programmatically using filter.set_selected_context().
[6]:
explanations.filter.set_selected_context(
{"pyChannel": "PegaBatch",
"pyDirection": "E2E Test",
"pyGroup": "E2E Test",
"pyIssue": "Batch",
"pyName": "P2"
})
[7]:
context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)
Advanced data exploration¶
For more control, you can work with the aggregate object directly, inspect the underlying data, and build custom analyses.
[8]:
aggregate = explanations.aggregate # load the aggregated data
Inspect data for the overall model¶
Get the top-n predictors and their contributions using get_predictor_contributions().
[9]:
df_overall = aggregate.get_predictor_contributions(top_n = 3, remaining=False)
df_overall
[9]:
| partition | predictor_name | predictor_type | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max |
|---|---|---|---|---|---|---|---|---|---|
| str | str | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 |
| "whole_model" | "pyName" | "SYMBOLIC" | -0.021191 | 0.021255 | -0.021204 | 0.021269 | 50000 | -0.044283 | 0.029203 |
| "whole_model" | "Age" | "NUMERIC" | -0.011306 | 0.011738 | -0.011167 | 0.011536 | 50000 | -0.034704 | 0.023485 |
| "whole_model" | "Occupation" | "SYMBOLIC" | -0.008904 | 0.010193 | -0.008977 | 0.010151 | 50000 | -0.032185 | 0.06411 |
Inspect the most influential values of those predictors using get_predictor_value_contributions().
[10]:
top_n_predictors = df_overall.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
predictors=top_n_predictors,
top_k = 2,
remaining=False
)
[10]:
| partition | predictor_name | predictor_type | bin_order | bin_contents | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max | sort_column | sort_value |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | str | i64 | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 | str | f64 |
| "whole_model" | "Age" | "NUMERIC" | 0 | "MISSING" | -0.013647 | 0.015131 | -0.000503 | 0.000557 | 1842 | -0.034704 | 0.023485 | "bin_order" | 0.0 |
| "whole_model" | "Age" | "NUMERIC" | 5 | "[43.000:48.000]" | -0.013174 | 0.013192 | -0.001269 | 0.001271 | 4816 | -0.026145 | 0.006043 | "bin_order" | 5.0 |
| "whole_model" | "Occupation" | "SYMBOLIC" | 42 | "Psychotherapist, child" | -0.016257 | 0.016257 | -0.000268 | 0.000268 | 825 | -0.032185 | -0.003855 | "contribution_abs" | 0.016257 |
| "whole_model" | "Occupation" | "SYMBOLIC" | 38 | "TEFL teacher" | 0.020043 | 0.020166 | 0.000343 | 0.000345 | 855 | -0.011896 | 0.06411 | "contribution_abs" | 0.020166 |
| "whole_model" | "pyName" | "SYMBOLIC" | 15 | "P18" | -0.024462 | 0.024462 | -0.001208 | 0.001208 | 2469 | -0.03746 | -0.009315 | "contribution_abs" | 0.024462 |
| "whole_model" | "pyName" | "SYMBOLIC" | 1 | "P1" | -0.025354 | 0.025354 | -0.001307 | 0.001307 | 2577 | -0.044283 | -0.007185 | "contribution_abs" | 0.025354 |
Inspect data by level in the action hierarchy¶
The same methods can drill down into the action hierarchy instead of looking at the overall model.
[11]:
import random
context_info = random.choice(aggregate.get_unique_contexts_list())
print('Selected random context: \n')
for key, value in context_info.items():
print(f'{key}: {value}')
df_by_context = aggregate.get_predictor_contributions(
context=context_info,
top_n=3,
remaining=False)
df_by_context
Selected random context:
pyChannel: PegaBatch
pyDirection: E2E Test
pyGroup: E2E Test
pyIssue: Batch
pyName: P19
[11]:
| partition | predictor_name | predictor_type | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max |
|---|---|---|---|---|---|---|---|---|---|
| str | str | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 |
| "{"partition":{"pyChannel":"Peg… | "pyName" | "SYMBOLIC" | -0.02322 | 0.02324 | -0.02322 | 0.02324 | 2522 | -0.035388 | 0.008371 |
| "{"partition":{"pyChannel":"Peg… | "Age" | "NUMERIC" | -0.012688 | 0.013053 | -0.012353 | 0.01273 | 2522 | -0.029794 | 0.014146 |
| "{"partition":{"pyChannel":"Peg… | "Occupation" | "SYMBOLIC" | -0.011131 | 0.012662 | -0.011287 | 0.012633 | 2522 | -0.028257 | 0.046894 |
[12]:
top_n_predictors = df_by_context.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
predictors=top_n_predictors,
top_k=2,
context=context_info,
remaining=False)
[12]:
| partition | predictor_name | predictor_type | bin_order | bin_contents | contribution | contribution_abs | contribution_weighted | contribution_weighted_abs | frequency | contribution_min | contribution_max | sort_column | sort_value |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | str | i64 | str | f64 | f64 | f64 | f64 | i64 | f64 | f64 | str | f64 |
| "{"partition":{"pyChannel":"Peg… | "Age" | "NUMERIC" | 0 | "MISSING" | -0.018675 | 0.018813 | -0.000748 | 0.000753 | 101 | -0.029794 | 0.001164 | "bin_order" | 0.0 |
| "{"partition":{"pyChannel":"Peg… | "Age" | "NUMERIC" | 5 | "[44.000:48.000]" | -0.014921 | 0.014921 | -0.001432 | 0.001432 | 242 | -0.02393 | -0.004404 | "bin_order" | 5.0 |
| "{"partition":{"pyChannel":"Peg… | "Occupation" | "SYMBOLIC" | 41 | "Psychotherapist, child" | -0.019958 | 0.019958 | -0.000317 | 0.000317 | 40 | -0.028257 | -0.010531 | "contribution_abs" | 0.019958 |
| "{"partition":{"pyChannel":"Peg… | "Occupation" | "SYMBOLIC" | 35 | "TEFL teacher" | 0.033853 | 0.033853 | 0.000591 | 0.000591 | 44 | 0.022905 | 0.046894 | "contribution_abs" | 0.033853 |
| "{"partition":{"pyChannel":"Peg… | "pyName" | "SYMBOLIC" | 1 | "P19" | -0.02322 | 0.02324 | -0.02322 | 0.02324 | 2522 | -0.035388 | 0.008371 | "contribution_abs" | 0.02324 |