{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "1a99ea0b", "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# These lines are only for rendering in the docs, and are hidden through Jupyter tags\n", "# Do not run if you're running the notebook seperately\n", "\n", "import plotly.io as pio\n", "\n", "pio.renderers.default = \"notebook_connected\"" ] }, { "cell_type": "markdown", "id": "y5ffwchvqi", "metadata": {}, "source": [ "# Global Explanations for Adaptive Gradient Boosting Models\n", "\n", "This notebook demonstrates how to analyze and visualize global explanations for Adaptive Gradient Boosting models in Pega Adaptive Decision Manager (ADM). Global explanations provide insights into which predictors have the most influence on model predictions and how different predictor values correlate with the model's scores.\n", "\n", "**Important:** The SHAP explanation datasets used in this notebook are available starting with **Pega Infinity '25**. In earlier versions, the explanation files are not written to the repository.\n", "\n", "The explanation files contain SHAP (SHapley Additive exPlanations) contributions for a sample of model executions. This notebook shows how these contributions are aggregated to provide global explanations for Gradient Boosting models. In Pega Infinity '25, a Global Explanations report can also be generated directly from [Prediction Studio](https://docs.pega.com/bundle/platform/page/platform/decision-management/gradient-boosting-explanations.html).\n", "\n", "**Note on earlier versions:** In versions prior to Infinity '25, Feature Importance (also known as Predictor Importance) is already available for Gradient Boosting models and uses SHAP values. In Infinity '25, we enhance this by providing detailed insights into the correlation between predictor _values_ and the model's score (for example, high income may correlate with a high contribution to the propensity score for a platinum credit card offer) rather than just the importances of predictors. See: https://docs.pega.com/bundle/platform/page/platform/decision-management/view-summarized-reports-adm.html\n" ] }, { "cell_type": "markdown", "id": "730bffd4", "metadata": {}, "source": [ "## Aggregate data exported from Infinity\n", "\n", "**Prerequisite**: You have already exported explanation files from Infinity.\n", "\n", "Parameters:\n", "- `data_folder`: the folder that contains the explanation files\n", "- `model_name`: `optional` - the model rule to check for explanations; if not passed, will pick up any file in the folder\n", "- `from_date`: `optional` - if not passed, will be today - 7 days\n", "- `to_date`: `optional` - if not passed, will be today\n", "\n", "The aggregated data will be stored in the `.tmp/aggregated_data` directory." ] }, { "cell_type": "code", "execution_count": null, "id": "45f9f307", "metadata": {}, "outputs": [], "source": [ "from pdstools.explanations import Explanations\n", "\n", "import datetime\n", "import polars as pl\n", "import logging\n", "# logging.basicConfig(level=logging.INFO) # Uncomment to see progress for large files\n", "\n", "\n", "explanations = Explanations(\n", " # data_folder='../../data/explanations/', # Uncomment this line and provide folder location of exported explanations data\n", " data_file=\"https://raw.githubusercontent.com/pegasystems/pega-datascientist-tools/master/data/explanations/AdaptiveBoostCT_20250328064604.parquet\", # Remove this argument when running locally with exported data\n", " model_name='AdaptiveBoostCT',\n", " from_date=datetime.date(2024,3,28),\n", " to_date=datetime.date(2025,3,28)\n", ")" ] }, { "cell_type": "markdown", "id": "954a4362", "metadata": {}, "source": [ "## Plotting contributions\n", "\n", "The [contributions()](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/Plots/index.html#pdstools.explanations.Plots.Plots.contributions) method plots the top-n most influential predictors and, for each of them, shows how individual predictor values contribute to the model score. Numeric values are binned (max 10 bins), symbolic predictors show the top-k categories.\n", "\n", "Results can be shown for the model overall or drilled down into the action hierarchy (direction, channel, issue, group, action name), referred to as \"context\" in the API." ] }, { "cell_type": "markdown", "id": "4ef8a504", "metadata": {}, "source": [ "### Overall model\n", "\n", "Calling [contributions()](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/Plots/index.html#pdstools.explanations.Plots.Plots.contributions) without selecting a level in the action hierarchy aggregates over all actions.\n", "\n", "Parameters:\n", "- `top_n`: Number of top predictors to plot.\n", "- `top_k`: Number of top predictor values for symbolic predictors to plot.\n", "- `remaining`: If `True`, the remaining predictors will be plotted as a single bar.\n", "- `missing`: If `True`, the missing values will be plotted as a separate bar.\n", "- `descending`: If `True`, the predictors will be sorted in descending order of their contributions, i.e. least contributing predictors will be plotted first.\n", "- `contribution_calculation`: Method to calculate contributions. Options include `contribution`, `contribution_abs`, `contribution_weighted`. Default is `contribution` (average contributions to predictions)." ] }, { "cell_type": "code", "execution_count": null, "id": "01768e07", "metadata": {}, "outputs": [], "source": [ "_, plots = explanations.plot.contributions(top_n=3, top_k=5, remaining=True)" ] }, { "cell_type": "markdown", "id": "78dfa154", "metadata": {}, "source": [ "### By level in the action hierarchy\n", "\n", "Use [filter.interactive()](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/FilterWidget/index.html#pdstools.explanations.FilterWidget.FilterWidget.interactive) to display a picker that lets you drill down into the action hierarchy. Use the comboboxes on the left to narrow down by direction, channel, issue, group, or action name, then select one on the right.\n", "\n", "After making a selection, call [contributions()](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/Plots/index.html#pdstools.explanations.Plots.Plots.contributions) again to plot contributions for that selection only." ] }, { "cell_type": "code", "execution_count": null, "id": "ed529772", "metadata": {}, "outputs": [], "source": [ "explanations.filter.interactive()" ] }, { "cell_type": "code", "execution_count": null, "id": "b6e45ff1", "metadata": {}, "outputs": [], "source": [ "context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)" ] }, { "cell_type": "markdown", "id": "54b4d960", "metadata": {}, "source": [ "You can also select a level in the action hierarchy programmatically using [filter.set_selected_context()](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/FilterWidget/index.html#pdstools.explanations.FilterWidget.FilterWidget.set_selected_context)." ] }, { "cell_type": "code", "execution_count": null, "id": "ce0b1645", "metadata": {}, "outputs": [], "source": [ "explanations.filter.set_selected_context(\n", " {\"pyChannel\": \"PegaBatch\",\n", " \"pyDirection\": \"E2E Test\",\n", " \"pyGroup\": \"E2E Test\",\n", " \"pyIssue\": \"Batch\",\n", " \"pyName\": \"P2\"\n", "})" ] }, { "cell_type": "code", "execution_count": null, "id": "c2a4d7ec", "metadata": {}, "outputs": [], "source": [ "context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)" ] }, { "cell_type": "markdown", "id": "f9d1ccec", "metadata": {}, "source": [ "## Advanced data exploration\n", "\n", "For more control, you can work with the [aggregate](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/Aggregate/index.html#pdstools.explanations.Aggregate.Aggregate) object directly, inspect the underlying data, and build custom analyses." ] }, { "cell_type": "code", "execution_count": null, "id": "a106a296", "metadata": {}, "outputs": [], "source": [ "aggregate = explanations.aggregate # load the aggregated data" ] }, { "cell_type": "markdown", "id": "0b9c5bfa", "metadata": {}, "source": [ "### Inspect data for the overall model" ] }, { "cell_type": "markdown", "id": "39907165", "metadata": {}, "source": [ "Get the top-n predictors and their contributions using [get_predictor_contributions()](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/Aggregate/index.html#pdstools.explanations.Aggregate.Aggregate.get_predictor_contributions)." ] }, { "cell_type": "code", "execution_count": null, "id": "34aec57c", "metadata": {}, "outputs": [], "source": [ "df_overall = aggregate.get_predictor_contributions(top_n = 3, remaining=False)\n", "df_overall\n" ] }, { "cell_type": "markdown", "id": "85d3f460", "metadata": {}, "source": [ "Inspect the most influential values of those predictors using [get_predictor_value_contributions()](https://pegasystems.github.io/pega-datascientist-tools/autoapi/pdstools/explanations/Aggregate/index.html#pdstools.explanations.Aggregate.Aggregate.get_predictor_value_contributions)." ] }, { "cell_type": "code", "execution_count": null, "id": "c6d6fd0d", "metadata": {}, "outputs": [], "source": [ "top_n_predictors = df_overall.select(pl.col('predictor_name')).unique().to_series().to_list()\n", "aggregate.get_predictor_value_contributions(\n", " predictors=top_n_predictors, \n", " top_k = 2, \n", " remaining=False\n", ")" ] }, { "cell_type": "markdown", "id": "0919edd5", "metadata": {}, "source": [ "### Inspect data by level in the action hierarchy" ] }, { "cell_type": "markdown", "id": "abe47103", "metadata": {}, "source": [ "The same methods can drill down into the action hierarchy instead of looking at the overall model." ] }, { "cell_type": "code", "execution_count": null, "id": "40217fb3", "metadata": {}, "outputs": [], "source": [ "import random\n", "context_info = random.choice(aggregate.get_unique_contexts_list())\n", "print('Selected random context: \\n')\n", "for key, value in context_info.items():\n", " print(f'{key}: {value}')\n", "df_by_context = aggregate.get_predictor_contributions(\n", " context=context_info, \n", " top_n=3, \n", " remaining=False)\n", "df_by_context\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5e15553f", "metadata": {}, "outputs": [], "source": [ "top_n_predictors = df_by_context.select(pl.col('predictor_name')).unique().to_series().to_list()\n", "aggregate.get_predictor_value_contributions(\n", " predictors=top_n_predictors, \n", " top_k=2, \n", " context=context_info, \n", " remaining=False)" ] } ], "metadata": { "kernelspec": { "display_name": "pega-datascientist-tools", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 5 }