Explainability Extract Analysis¶
This notebook demonstrates the DecisionAnalyzer class from pdstools using the Explainability Extract v1 dataset.
The Explainability Extract captures actions at the Arbitration stage only. It can be extracted from Infinity 24.1 and earlier versions. For the full decision funnel (v2 format), see the Decision Analyzer notebook.
The analyses below cover action distribution, sensitivity, win/loss patterns, and personalization.
[2]:
from pdstools.decision_analyzer.data_read_utils import read_data
from pdstools.decision_analyzer.DecisionAnalyzer import DecisionAnalyzer
from pdstools import read_ds_export
import polars as pl
Read Data¶
Load the sample Explainability Extract data and create a DecisionAnalyzer instance. The sample data can be downloaded directly from GitHub.
[3]:
df = read_ds_export(
filename="sample_explainability_extract.parquet",
path="https://raw.githubusercontent.com/pegasystems/pega-datascientist-tools/master/data",
)
decision_data = DecisionAnalyzer(df)
[4]:
decision_data.decision_data.collect()
[4]:
| Subject ID | Interaction ID | Decision Time | Issue | Group | Action | Channel | Direction | Value | Context Weight | Levers | pyPropensity | Propensity | Priority | Model Control Group | is_mandatory | day | Rank | Stage Group | Stage Order | Record Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | datetime[μs] | str | str | str | str | str | f64 | f64 | f64 | f64 | f64 | f32 | str | i32 | date | u32 | cat | i32 | str |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "CreditCard_1" | "pyName_232" | "Mobile" | "Inbound" | 0.37 | 0.590017 | 1.512527 | 0.016784 | 0.016551 | 0.005465 | "Test" | 0 | 2024-06-17 | 25 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Mortgage_1" | "pyName_251" | "Mobile" | "Inbound" | 0.3 | 1.360237 | 1.169866 | 0.5 | 0.497675 | 0.237584 | "Test" | 0 | 2024-06-17 | 3 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Proactive" | "pyName_439" | "Mobile" | "Inbound" | 0.3 | 0.718434 | 1.427434 | 0.007971 | 0.008 | 0.002461 | "Test" | 0 | 2024-06-17 | 41 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Collections_1" | "pyName_82" | "Mobile" | "Inbound" | 0.55 | 1.344401 | 1.023569 | 0.001216 | 0.001199 | 0.000908 | "Test" | 0 | 2024-06-17 | 54 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Acquisition" | "Proactive_1" | "pyName_363" | "Mobile" | "Inbound" | 0.3 | 0.606607 | 1.453102 | 0.009486 | 0.009451 | 0.002499 | "Test" | 0 | 2024-06-17 | 40 | "Arbitration" | 10000 | "FILTERED_OUT" |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
| "pySubjectID_1748" | "-8570391784720840506" | 2024-06-15 18:21:40.202300 | "Engagement" | "Proactive_1" | "pyName_242" | "Mobile" | "Inbound" | 0.3 | 0.095381 | 0.868015 | 0.029665 | 0.029905 | 0.000743 | "Test" | 0 | 2024-06-15 | 62 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_1748" | "-8570391784720840506" | 2024-06-15 18:21:40.202300 | "Retention" | "Collections_1" | "pyName_426" | "Mobile" | "Inbound" | 0.58 | 1.723052 | 1.257973 | 0.0128 | 0.012372 | 0.015554 | "Test" | 0 | 2024-06-15 | 12 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_1748" | "-8570391784720840506" | 2024-06-15 18:21:40.202300 | "Retention" | "Mortgage" | "pyName_61" | "Mobile" | "Inbound" | 0.58 | 0.067682 | 1.208878 | 0.5 | 0.503617 | 0.023899 | "Test" | 0 | 2024-06-15 | 8 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_1748" | "-8570391784720840506" | 2024-06-15 18:21:40.202300 | "Retention" | "Mortgage" | "pyName_178" | "Mobile" | "Inbound" | 0.74 | 1.187729 | 1.560156 | 0.030844 | 0.030643 | 0.042019 | "Test" | 0 | 2024-06-15 | 5 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_1748" | "-8570391784720840506" | 2024-06-15 18:21:40.202300 | "Engagement" | "RetailBank_1" | "pyName_463" | "Mobile" | "Inbound" | 0.37 | 0.507969 | 1.212289 | 0.01276 | 0.013194 | 0.003006 | "Test" | 0 | 2024-06-15 | 40 | "Arbitration" | 10000 | "FILTERED_OUT" |
Overview¶
General statistics of the dataset.
[5]:
decision_data.get_overview_stats
[5]:
{'Actions': 152,
'Channels': 1,
'Duration': datetime.timedelta(days=7),
'StartDate': datetime.date(2024, 6, 12),
'Customers': 100,
'Decisions': 100,
'avgOffersAtArbitration': 73,
'avgAvailable': 0}
A single decision. The number of rows shows how many actions are available at the Arbitration stage. Rank shows the action ranking.
[6]:
selected_interaction_id = (
decision_data.unfiltered_raw_decision_data.select("Interaction ID")
.first()
.collect()
.row(0)[0]
)
print(f"{selected_interaction_id=}")
decision_data.unfiltered_raw_decision_data.filter(
pl.col("Interaction ID") == selected_interaction_id
).sort("Rank").collect()
selected_interaction_id='-8570391784720831070'
[6]:
| Subject ID | Interaction ID | Decision Time | Issue | Group | Action | Channel | Direction | Value | Context Weight | Levers | pyPropensity | Propensity | Priority | Model Control Group | is_mandatory | day | Rank | Stage Group | Stage Order | Record Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | datetime[μs] | str | str | str | str | str | f64 | f64 | f64 | f64 | f64 | f32 | str | i32 | date | u32 | cat | i32 | str |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Mortgage" | "pyName_476" | "Mobile" | "Inbound" | 0.58 | 1.260294 | 1.060128 | 0.5 | 0.470596 | 0.364676 | "Test" | 0 | 2024-06-17 | 1 | "Output" | 3800 | "OUTPUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Mortgage" | "pyName_61" | "Mobile" | "Inbound" | 0.58 | 1.247556 | 0.788446 | 0.5 | 0.501949 | 0.286365 | "Test" | 0 | 2024-06-17 | 2 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Mortgage_1" | "pyName_251" | "Mobile" | "Inbound" | 0.3 | 1.360237 | 1.169866 | 0.5 | 0.497675 | 0.237584 | "Test" | 0 | 2024-06-17 | 3 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "RetailBank_1" | "pyName_590" | "Mobile" | "Inbound" | 0.35 | 0.706076 | 1.466814 | 0.5 | 0.503524 | 0.182522 | "Test" | 0 | 2024-06-17 | 4 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "RetailBank_1" | "pyName_211" | "Mobile" | "Inbound" | 0.38 | 0.582079 | 1.197211 | 0.5 | 0.511875 | 0.13555 | "Test" | 0 | 2024-06-17 | 5 | "Arbitration" | 10000 | "FILTERED_OUT" |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Acquisition" | "Billing" | "pyName_206" | "Mobile" | "Inbound" | 0.3 | 0.302888 | 0.585444 | 0.003473 | 0.003476 | 0.000185 | "Test" | 0 | 2024-06-17 | 68 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "ActivateUse" | "Proactive_1" | "pyName_362" | "Mobile" | "Inbound" | 0.46 | 0.306666 | 0.546429 | 0.002301 | 0.002243 | 0.000173 | "Test" | 0 | 2024-06-17 | 69 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Billing_1" | "pyName_224" | "Mobile" | "Inbound" | 0.11 | 0.571386 | 1.463025 | 0.001677 | 0.001675 | 0.000154 | "Test" | 0 | 2024-06-17 | 70 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Mortgage" | "pyName_431" | "Mobile" | "Inbound" | 0.26 | 1.481145 | 0.631846 | 0.000245 | 0.000242 | 0.000059 | "Test" | 0 | 2024-06-17 | 71 | "Arbitration" | 10000 | "FILTERED_OUT" |
| "pySubjectID_466" | "-8570391784720831070" | 2024-06-17 18:21:40.202300 | "Retention" | "Proactive" | "pyName_483" | "Mobile" | "Inbound" | 0.3 | 1.02696 | 0.897542 | 0.000103 | 0.000101 | 0.000028 | "Test" | 0 | 2024-06-17 | 72 | "Arbitration" | 10000 | "FILTERED_OUT" |
Action Distribution¶
Distribution of actions at the Arbitration stage. Helps identify action groups that rarely survive to Arbitration.
[7]:
stage = "Arbitration"
scope_options = ["Issue", "Group", "Action"]
distribution_data = decision_data.getDistributionData(stage, scope_options)
fig = decision_data.plot.distribution_as_treemap(
df=distribution_data, stage=stage, scope_options=scope_options
)
fig.show()
Global Sensitivity¶
Shows the impact of each arbitration factor (Propensity, Value, Context Weight, Levers) on the final decision. Each bar represents how many decisions would change if that factor were removed. Ideally, Propensity should have the strongest influence.
[8]:
decision_data.plot.sensitivity(win_rank=1)
Wins and Losses in Arbitration¶
Distribution of wins and losses by Issue. The level parameter can be set to "Group" or "Action" for different granularity. Actions are classified as winning or losing based on win_rank.
[9]:
decision_data.plot.global_winloss_distribution(level="Issue", win_rank=1)
Optionality Analysis¶
Shows how many actions are available per customer at the Arbitration stage. Limited optionality reduces the ability to personalize. The bars show decision counts per number of available actions; the line shows average propensity of the top-ranked action. Average propensity should increase with more available actions.
[10]:
decision_data.plot.propensity_vs_optionality(stage="Arbitration")
Win/Loss Analysis¶
Win Analysis¶
Select an action and see how often it wins and which actions it defeats.
[11]:
win_rank = 1
selected_action = (
decision_data.unfiltered_raw_decision_data.filter(pl.col("Rank") == 1)
.group_by("Action")
.len()
.sort("len", descending=True)
.collect()
.get_column("Action")
.to_list()[1]
)
filter_statement = pl.col("Action") == selected_action
interactions_where_comparison_group_wins = (
decision_data.get_winning_or_losing_interactions(
win_rank=win_rank,
group_filter=filter_statement,
win=True,
)
)
print(
f"selected action '{selected_action}' wins(Rank{win_rank}) in {interactions_where_comparison_group_wins.collect().height} interactions."
)
selected action 'pyName_61' wins(Rank1) in 30 interactions.
Actions that lose to the selected action in arbitration.
[12]:
# Losing actions in interactions where the selected action wins.
groupby_cols = ["Issue", "Group", "Action"]
winning_from = decision_data.winning_from(
interactions=interactions_where_comparison_group_wins,
win_rank=win_rank,
groupby_cols=groupby_cols,
top_k=20,
)
decision_data.plot.distribution_as_treemap(
df=winning_from, stage="Arbitration", scope_options=groupby_cols
)
Loss Analysis¶
Actions that beat the selected action in arbitration.
[13]:
interactions_where_comparison_group_loses = (
decision_data.get_winning_or_losing_interactions(
win_rank=win_rank,
group_filter=filter_statement,
win=False,
)
)
print(
f"selected action '{selected_action}' loses in {interactions_where_comparison_group_loses.collect().height} interactions."
)
# Winning actions in interactions where the selected action loses.
losing_to = decision_data.losing_to(
interactions=interactions_where_comparison_group_loses,
win_rank=win_rank,
groupby_cols=groupby_cols,
top_k=20,
)
decision_data.plot.distribution_as_treemap(
df=losing_to, stage="Arbitration", scope_options=groupby_cols
)
selected action 'pyName_61' loses in 58 interactions.
Sensitivity for Selected Action¶
Change in win count when each prioritization factor is individually removed. Unlike the global sensitivity above, negative values are possible: a negative value means removing that factor would increase wins for the selected action (i.e., that factor is hurting it).
[14]:
decision_data.plot.sensitivity(
reference_group=pl.col("Action") == selected_action
)
Prioritization Factor Distributions¶
Box plots comparing the arbitration factor distributions of the selected action vs competitors in the same interactions.
[15]:
fig, warning_message = decision_data.plot.prio_factor_boxplots(
reference=pl.col("Action") == selected_action,
)
if warning_message:
print(warning_message)
else:
fig.show()
Rank Distribution¶
Distribution of the prioritization rank for the selected action. Low ranks indicate the action is not often winning.
[16]:
decision_data.plot.rank_boxplot(
reference=pl.col("Action") == selected_action,
)