Decision Analysis

This notebook demonstrates the DecisionAnalyzer class from pdstools for analyzing NBA decision data.

The class works with two data formats:

  • Explainability Extract (v1): Actions at the arbitration stage only.

  • Decision Analyzer / EEV2 (v2): Full decision funnel with stage information and filter components.

The analyses below cover the decision funnel, action distribution, sensitivity analysis, win/loss patterns, personalization, and lever experimentation.

[2]:
from pdstools.decision_analyzer.data_read_utils import read_data
from pdstools.decision_analyzer.DecisionAnalyzer import DecisionAnalyzer
from pdstools.decision_analyzer.plots import create_win_distribution_plot
from pdstools.decision_analyzer.utils import get_scope_config
from pdstools import read_ds_export
import polars as pl

Read Data

Load the sample EEV2 data and create a DecisionAnalyzer instance. The sample data can be downloaded directly from GitHub.

[3]:
df = read_ds_export(
    filename="sample_eev2.parquet",
    path="https://raw.githubusercontent.com/pegasystems/pega-datascientist-tools/master/data",
)
decision_data = DecisionAnalyzer(df)

Overview

General statistics of the dataset.

[4]:
decision_data.get_overview_stats
[4]:
{'Actions': 29,
 'Channels': 1,
 'Duration': datetime.timedelta(days=7, seconds=1554, microseconds=90000),
 'StartDate': datetime.date(2025, 2, 25),
 'Customers': 1173,
 'Decisions': 7069,
 'avgOffersAtArbitration': 13,
 'avgAvailable': 0}

A single decision. The number of rows shows how many actions are available at the Arbitration stage. Rank shows the action ranking (null in earlier stages where propensity is not yet set).

[5]:
selected_interaction_id = (
    decision_data.unfiltered_raw_decision_data.select("Interaction ID")
    .first()
    .collect()
    .row(0)[0]
)
print(f"{selected_interaction_id=}")
decision_data.unfiltered_raw_decision_data.filter(
    pl.col("Interaction ID") == selected_interaction_id
).sort("Rank").collect()
selected_interaction_id='IX-D0-000000'
[5]:
shape: (204, 28)
Record TypeSubject IDSubject TypeInteraction IDDecision TimeIssueGroupActionTreatmentPlacement TypeStrategy NameChannelDirectionStageStage GroupStage OrderComponent NameComponent TypeValueContext WeightLeversPropensityPriorityApplicationApplication Versionis_mandatorydayRank
strstrstrstrdatetime[ms, UTC]strstrstrstrstrstrstrstrstrcati32strstrf64f64f64f64f64strstri32dateu64
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Growth""Banking""Action021""Action021_T"null"StrategyName049""Web""Inbound""StageName004""JourneysContactPolicies"1800"ComponentName055""External Substrategy"nullnullnullnullnull"App1""01.01.01"02025-02-251
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Growth""CreditCards""Action005""Action005_T"null"StrategyName049""Web""Inbound""StageName004""JourneysContactPolicies"1800"ComponentName055""External Substrategy"nullnullnullnullnull"App1""01.01.01"02025-02-252
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Growth""CreditCards""Action015""Action015_T"null"StrategyName049""Web""Inbound""StageName004""JourneysContactPolicies"1800"ComponentName055""External Substrategy"nullnullnullnullnull"App1""01.01.01"02025-02-253
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Growth""Insurance""Action023""Action023_T"null"StrategyName049""Web""Inbound""StageName004""JourneysContactPolicies"1800"ComponentName055""External Substrategy"nullnullnullnullnull"App1""01.01.01"02025-02-254
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Growth""Loans""Action001""Action001_T"null"StrategyName049""Web""Inbound""StageName004""JourneysContactPolicies"1800"ComponentName055""External Substrategy"nullnullnullnullnull"App1""01.01.01"02025-02-255
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Sales""Research""Action027""Action027_T"null"StrategyName014""Web""Inbound""StageName007""EngagementPolicies"1400"ComponentName157""Proposition Filter"nullnullnullnullnull"App1""01.01.01"02025-02-25200
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Sales""Research""Action029""Action029_T"null"StrategyName062""Web""Inbound""StageName007""EngagementPolicies"1400"ComponentName078""Proposition Filter"nullnullnullnullnull"App1""01.01.01"02025-02-25201
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Sales""Savings""Action012""Action012_T"null"StrategyName024""Web""Inbound""StageName007""EngagementPolicies"1400"ComponentName109""Proposition Filter"nullnullnullnullnull"App1""01.01.01"02025-02-25202
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Sales""Savings""Action017""Action017_T"null"StrategyName066""Web""Inbound""StageName007""EngagementPolicies"1400"ComponentName058""Proposition Filter"nullnullnullnullnull"App1""01.01.01"02025-02-25203
"FILTERED_OUT""362148589155""Data-Customer""IX-D0-000000"2025-02-25 18:25:20.532 UTC"Sales""Savings""Action018""Action018_T"null"StrategyName066""Web""Inbound""StageName007""EngagementPolicies"1400"ComponentName058""Proposition Filter"nullnullnullnullnull"App1""01.01.01"02025-02-25204

Decision Funnel

Shows which actions are filtered out at each stage and by which component. Useful for answering: where do specific actions get dropped?

Remaining View

[6]:
remanining_funnel, filtered_funnel = decision_data.plot.decision_funnel(
    scope="Issue", additional_filters=None, return_df=False
)
remanining_funnel

Filter View

[7]:
filtered_funnel

Custom filter analysis using the raw data to see exactly which components are filtering and how much.

[8]:
filter_table = (
    decision_data.decision_data.filter(pl.col("Record Type") == "FILTERED_OUT")
    .group_by(["Stage Order", "Stage Group", "Stage", "Component Name"])
    .agg(pl.len().alias("filter count"))
    .with_columns(
        (
            pl.format(
                "{}%",
                ((pl.col("filter count") / pl.sum("filter count")) * 100).round(1),
            )
        ).alias("percent of all filters")
    )
    .collect()
    .sort("filter count", descending=True)
)
filter_table
[8]:
shape: (110, 6)
Stage OrderStage GroupStageComponent Namefilter countpercent of all filters
i32catstrstru64str
1400"EngagementPolicies""StageName007""ComponentName160"88150"6.6%"
1400"EngagementPolicies""StageName007""ComponentName140"84565"6.3%"
1400"EngagementPolicies""StageName007""ComponentName157"73999"5.5%"
1800"JourneysContactPolicies""StageName004""ComponentName055"69984"5.2%"
1400"EngagementPolicies""StageName007""ComponentName130"68481"5.1%"
1500"EngagementPolicies""StageName006""ComponentName115"113"0.0%"
1500"EngagementPolicies""StageName006""ComponentName119"91"0.0%"
1500"EngagementPolicies""StageName006""ComponentName076"77"0.0%"
1600"EngagementPolicies""StageName003""ComponentName154"24"0.0%"
1500"EngagementPolicies""StageName006""ComponentName063"12"0.0%"

Action Distribution

Distribution of actions at the Arbitration stage. Helps identify action groups that rarely survive to Arbitration.

[9]:
stage = "Arbitration"
scope_options = ["Issue", "Group", "Action"]
distribution_data = decision_data.getDistributionData(stage, scope_options)
fig = decision_data.plot.distribution_as_treemap(
    df=distribution_data, stage=stage, scope_options=scope_options
)
fig.show()

Global Sensitivity

Shows the impact of each arbitration factor (Propensity, Value, Context Weight, Levers) on the final decision. Each bar represents how many decisions would change if that factor were removed. Ideally, Propensity should have the strongest influence.

[10]:
decision_data.plot.sensitivity(win_rank=1)

Wins and Losses in Arbitration

Distribution of wins and losses by Issue. The level parameter can be set to "Group" or "Action" for different granularity. Actions are classified as winning or losing based on win_rank.

[11]:
decision_data.plot.global_winloss_distribution(level="Issue", win_rank=1)

Optionality Analysis

Shows how many actions are available per customer at the Arbitration stage. Limited optionality reduces the ability to personalize. The bars show decision counts per number of available actions; the line shows average propensity of the top-ranked action. Average propensity should increase with more available actions.

[12]:
decision_data.plot.propensity_vs_optionality(stage="Arbitration")

Win/Loss Analysis

Win Analysis

Select an action and see how often it wins and which actions it defeats.

[13]:
win_rank = 1
selected_action = (
    decision_data.unfiltered_raw_decision_data.filter(pl.col("Rank") == 1)
    .group_by("Action")
    .len()
    .sort("len", descending=True)
    .collect()
    .get_column("Action")
    .to_list()[1]
)
filter_statement = pl.col("Action") == selected_action

interactions_where_comparison_group_wins = (
    decision_data.get_winning_or_losing_interactions(
        win_rank=win_rank,
        group_filter=filter_statement,
        win=True,
    )
)

print(
    f"selected action '{selected_action}' wins(Rank{win_rank}) in {interactions_where_comparison_group_wins.collect().height} interactions."
)
selected action 'Action004' wins(Rank1) in 4 interactions.

Actions that lose to the selected action in arbitration.

[14]:
# Losing actions in interactions where the selected action wins.
groupby_cols = ["Issue", "Group", "Action"]
winning_from = decision_data.winning_from(
    interactions=interactions_where_comparison_group_wins,
    win_rank=win_rank,
    groupby_cols=groupby_cols,
    top_k=20,
)

decision_data.plot.distribution_as_treemap(
    df=winning_from, stage="Arbitration", scope_options=groupby_cols
)

Loss Analysis

Actions that beat the selected action in arbitration.

[15]:
interactions_where_comparison_group_loses = (
    decision_data.get_winning_or_losing_interactions(
        win_rank=win_rank,
        group_filter=filter_statement,
        win=False,
    )
)

print(
    f"selected action '{selected_action}' loses in {interactions_where_comparison_group_loses.collect().height} interactions."
)
# Winning actions in interactions where the selected action loses.
losing_to = decision_data.losing_to(
    interactions=interactions_where_comparison_group_loses,
    win_rank=win_rank,
    groupby_cols=groupby_cols,
    top_k=20,
)

decision_data.plot.distribution_as_treemap(
    df=losing_to, stage="Arbitration", scope_options=groupby_cols
)
selected action 'Action004' loses in 329 interactions.

Sensitivity for Selected Action

Change in win count when each prioritization factor is individually removed. Unlike the global sensitivity above, negative values are possible: a negative value means removing that factor would increase wins for the selected action (i.e., that factor is hurting it).

[16]:
decision_data.plot.sensitivity(
    reference_group=pl.col("Action") == selected_action
)

Prioritization Factor Distributions

Box plots comparing the arbitration factor distributions of the selected action vs competitors in the same interactions.

[17]:
fig, warning_message = decision_data.plot.prio_factor_boxplots(
    reference=pl.col("Action") == selected_action,
)
if warning_message:
    print(warning_message)
else:
    fig.show()

Rank Distribution

Distribution of the prioritization rank for the selected action. Low ranks indicate the action is not often winning.

[18]:
decision_data.plot.rank_boxplot(
    reference=pl.col("Action") == selected_action,
)

Arbitration Component Distribution

Distribution of prioritization components (Propensity, Value, Context Weight, Levers). Since prioritization uses multiplication, components with wide value ranges can dominate. Use the histogram for volume distribution and box plots for spread analysis.

[19]:
from pdstools.decision_analyzer.plots import plot_priority_component_distribution

component = "Value"
granularity = "Issue"
value_data = decision_data.priority_component_distribution(
    component=component,
    granularity=granularity,
)

violin_fig, ecdf_fig, stats_df = plot_priority_component_distribution(
    value_data=value_data, component=component, granularity=granularity
)
[20]:
violin_fig
[21]:
ecdf_fig

Experimenting with Levers

Levers can be adjusted to increase or decrease win counts for action groups. Steps:

  1. Select a group of actions

  2. Check current win counts and winner distribution

  3. Apply a new lever value and observe the change

Note: Boosting an action’s volume shows it to more uninterested customers, lowering propensity and click-through rate. This is a zero-sum game: increasing one group’s wins decreases others’. Compare before/after distributions carefully.

[22]:
# Selecting the actions in "Savings" group under "Sales" issue
selected_issue = "Sales"
selected_group = "Savings"
lever_condition = (pl.col("Issue") == selected_issue) & (
    pl.col("Group") == selected_group
)
original_distribution = decision_data.get_win_distribution_data(
    lever_condition,
    all_interactions=decision_data.sample_size,
)
[23]:
# You can see per action, how many times they win and how many times they survive to arbitration(max possible win count)
original_distribution
[23]:
shape: (72, 6)
IssueGroupActionoriginal_win_countn_decisions_survived_to_arbitrationselected_action
strstrstru64u64str
"Sales""Mortgage""Action014"881034"Rest"
"Growth""Loans""Action001"392264"Rest"
"Sales""Payment""Action006"34890"Rest"
"Growth""Loans""Action028"302268"Rest"
"Sales""Research""Action029"30540"Rest"
"Growth""Research""Action007"058"Rest"
"Growth""Payment""Action022"097"Rest"
"Sales""Savings""Action012"03"Selected"
"Sales""CreditCards""Action006"02154"Rest"
"No Winner""No Winner""No Winner"37130"No Winner"
[24]:
# Analyze the current winner distribution. In this example we are in Group granularity
# If we specified an action(using Action column) we could go into Action level. Notice that selected_action is "All", which means all actions under the selected issue, group
scope_config = get_scope_config(
    selected_issue = selected_issue,
    selected_group = selected_group,
    selected_action = "All",
)
original_fig, original_plot_data = create_win_distribution_plot(
    data = original_distribution,
    win_count_col = "original_win_count",
    scope_config = scope_config,
    title_suffix = "In Arbitration",
    y_axis_title = "Current Win Count",
)
original_fig
[25]:
# You can hover over the plot above, but you can also see the number of wins from the data.
original_plot_data.filter(lever_condition)
[25]:
shape: (1, 3)
IssueGrouporiginal_win_count
strstru64
"Sales""Savings"15
[26]:
# Now lets set the lever of selected actions to 5 and see how the new distribution looks like
lever = 5
distribution = decision_data.get_win_distribution_data(
    lever_condition,
    lever,
    all_interactions=decision_data.sample_size,
)

new_fig, new_plot_data = create_win_distribution_plot(
    data = distribution,
    win_count_col = "new_win_count",
    scope_config = scope_config,
    title_suffix = "After Lever Adjustment",
    y_axis_title = "New Win Count",
)
new_fig
[27]:
new_plot_data.filter(lever_condition)
[27]:
shape: (1, 3)
IssueGroupnew_win_count
strstru64
"Sales""Savings"429