pdstools.decision_analyzer

Submodules

Classes

DecisionAnalyzer

Analyze NBA decision data from Explainability Extract or Decision Analyzer exports.

Package Contents

class DecisionAnalyzer(raw_data: polars.LazyFrame, level='Stage Group', sample_size=DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None)

Analyze NBA decision data from Explainability Extract or Decision Analyzer exports.

This class processes raw decision data to create a comprehensive analysis framework for NBA (Next-Best-Action). It supports two data source formats:

  • Explainability Extract (v1): Simpler format with actions at the arbitration stage. Stages are synthetically derived from ranking.

  • Decision Analyzer / EEV2 (v2): Full pipeline data with real stage information, filter component names, and detailed strategy tracking.

Data can be loaded via class methods or directly:

Parameters:
  • raw_data (polars.LazyFrame)

  • mandatory_expr (polars.Expr | None)

  • additional_columns (dict[str, polars.DataType] | None)

decision_data

Interaction-level decision data (with global filters applied if any).

Type:

pl.LazyFrame

extract_type

Either "explainability_extract" or "decision_analyzer".

Type:

str

plot

Plot accessor for visualization methods.

Type:

Plot

Examples

>>> from pdstools import DecisionAnalyzer
>>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet")
>>> da.overview_stats
>>> da.plot.sensitivity()
classmethod from_explainability_extract(source: str | os.PathLike, **kwargs) DecisionAnalyzer

Create a DecisionAnalyzer from an Explainability Extract (v1) file.

Parameters:
  • source (str | os.PathLike) – Path to the Explainability Extract parquet file, or a URL.

  • **kwargs – Additional keyword arguments passed to __init__ (e.g. sample_size, mandatory_expr, additional_columns).

Return type:

DecisionAnalyzer

Examples

>>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet")
classmethod from_decision_analyzer(source: str | os.PathLike, **kwargs) DecisionAnalyzer

Create a DecisionAnalyzer from a Decision Analyzer / EEV2 (v2) file.

Parameters:
  • source (str | os.PathLike) – Path to the Decision Analyzer parquet file, or a URL.

  • **kwargs – Additional keyword arguments passed to __init__ (e.g. sample_size, mandatory_expr, additional_columns).

Return type:

DecisionAnalyzer

Examples

>>> da = DecisionAnalyzer.from_decision_analyzer("data/sample_eev2.parquet")
_default_filter_fields = ['Decision Time', 'Channel', 'Direction', 'Issue', 'Group', 'Action', 'Treatment', 'Stage',...
plot
level = 'Stage Group'
sample_size = 10000
_thresholding_cache: dict[tuple[str, tuple[int, Ellipsis]], polars.DataFrame]
_sensitivity_cache: dict[int, polars.LazyFrame]
extract_type = 'decision_analyzer'
validation_error = 'The following default columns are missing: '
unfiltered_raw_decision_data
fields_for_data_filtering
preaggregation_columns
max_win_rank = 5
AvailableNBADStages = ['Arbitration', 'Output']
property available_levels: list[str]

Stage granularity levels available for this dataset.

Returns ["Stage Group", "Stage"] for Decision Analyzer (v2) data when both columns are present, or ["Stage Group"] for Explainability Extract (v1) data where only synthetic stages exist.

Return type:

list[str]

set_level(level: str)

Switch the stage granularity level used for all analyses.

Recomputes the available stages for the new level and invalidates all cached properties so subsequent queries use the new granularity.

Parameters:

level (str) – "Stage Group" or "Stage".

_recompute_available_stages()

Derive AvailableNBADStages from the data for the current level.

At “Stage Group” level, synthetically injects “Arbitration” if it has no data rows (it is used as an anchor point by many analyses).

At “Stage” level, detects Stage Groups that have no individual stages represented in the data and inserts the group name as a placeholder so the full pipeline is visible.

property color_mappings: dict[str, dict[str, str]]

Compute consistent color mappings for all categorical dimensions.

Color assignments are based on all unique values in the full dataset (before sampling), sorted alphabetically. This ensures colors remain consistent throughout the session regardless of filtering.

Returns:

Nested dictionary mapping dimension names to color dictionaries. Example: {

”Issue”: {“Retention”: “#001F5F”, “Sales”: “#10A5AC”}, “Group”: {“CreditCards”: “#001F5F”, “Loans”: “#10A5AC”},

}

Return type:

dict[str, dict[str, str]]

Notes

Uses @cached_property so computation happens once on first access. Colors are assigned from the Pega colorway using modulo indexing.

See also

pdstools.utils.color_mapping.create_categorical_color_mappings

Generic utility for creating color mappings in any Streamlit app.

property stages_from_arbitration_down

All stages from Arbitration onward, respecting the current level.

At “Stage Group” level this slices from the literal “Arbitration” entry. At “Stage” level it finds stages whose Stage Order is >= the Arbitration group order (3800) using the stage_to_group_mapping.

property stages_with_propensity

Infer which stages have meaningful propensity scores from the data.

Examines the sample data to determine which stages have non-null, non-default propensity values. Returns stages where propensity-based classification makes sense.

property propensity_validation_warning: str | None

Validate propensity values and return warning message if issues detected.

Checks for: 1. Invalid propensities (> 1.0) - mathematically impossible for probabilities 2. Unusually high propensities (> 0.1) - uncommon for typical marketing interactions

Returns None if validation passes or propensity data is not available. Uses sample data for efficiency.

Return type:

str | None

property arbitration_stage: polars.LazyFrame

Sample rows remaining at or after the Arbitration stage.

Return type:

polars.LazyFrame

property num_sample_interactions: int

Number of unique interactions in the sample. Automatically triggers sampling if not yet calculated.

Return type:

int

_invalidate_cached_properties()

Resets the properties of the class. Needed for global filters.

apply_global_data_filters(filters: polars.Expr | list[polars.Expr] | None = None) None

Apply global filters to the decision data.

Replaces decision_data with a filtered subset of unfiltered_raw_decision_data and invalidates all cached properties so downstream queries reflect the new filter.

Parameters:

filters (pl.Expr or list of pl.Expr, optional) – Filter expression(s) applied to the raw data. If None, no change is made.

Return type:

None

reset_global_data_filters() None

Remove all global filters, restoring the full dataset.

Return type:

None

property preaggregated_filter_view

Pre-aggregates the full dataset over customers and interactions providing a view of what is filtered at a stage.

This pre-aggregation is pretty similar to what “VBD” does to interaction history. It aggregates over individual customers and interactions giving summary statistics that are sufficient to drive most of the analyses (but not all). The results of this pre-aggregation are much smaller than the original data and is expected to easily fit in memory. We therefore use polars caching to efficiently cache this.

This “filter” view keeps the same organization as the decision analyzer data in that it records the actions that get filtered out at stages. From this a “remaining” view is easily derived.

property preaggregated_remaining_view

Pre-aggregates the full dataset over customers and interactions providing a view of remaining offers.

This pre-aggregation builds on the filter view and aggregates over the stages remaining.

property sample

Hash-based deterministic sample of interactions for resource-intensive analyses.

Selects up to sample_size unique interactions using a hash of Interaction ID. All actions within a selected interaction are kept. If fewer interactions exist than sample_size, no sampling is performed.

When the --sample CLI flag is active, this operates on the already-reduced dataset, so two layers of sampling may apply.

property filtered_sample

Sample data with page-level filters applied.

Reads filter expressions from st.session_state.page_channel_expr if available. Falls back to unfiltered sample if no page filters are set or not in Streamlit context.

This property is not cached because it depends on mutable session_state. Page-level code should cache the result locally if needed for performance.

Returns:

Sampled data with page filters applied, or unfiltered sample if no filters.

Return type:

pl.LazyFrame

get_available_fields_for_filtering(categoricalOnly=False) list[str]

Return column names available for data filtering.

Parameters:

categoricalOnly (bool, default False) – If True, return only string/categorical columns.

Return type:

list[str]

cleanup_raw_data(df: polars.LazyFrame)

This method cleans up the raw data we read from parquet/S3/whatever.

This likely needs to change as and when we get closer to product, to match what comes out of Pega. It does some modest type casting and potentially changing back some of the temporary column names that have been added to generate more data.

Parameters:

df (polars.LazyFrame)

get_possible_scope_values() list[str]

Return scope hierarchy columns present in the data (e.g. Issue, Group, Action).

Return type:

list[str]

get_possible_stage_values() list[str]

Return the list of available stage values for the current level.

Return type:

list[str]

property stage_to_group_mapping: dict[str, str]

Map each Stage name to its Stage Group.

Only meaningful when level == "Stage" and both columns exist. Returns an empty dict otherwise (including v1 / explainability data).

Return type:

dict[str, str]

get_distribution_data(stage: str, grouping_levels: str | list[str], additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Distribution of decisions by grouping columns at a given stage.

Parameters:
  • stage (str) – Stage to filter on.

  • grouping_levels (str or list of str) – Column(s) to group by (e.g. "Action" or ["Channel", "Action"]).

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied before aggregation.

Returns:

Columns from grouping_levels plus Decisions, sorted descending.

Return type:

pl.LazyFrame

get_funnel_data(scope, additional_filters: polars.Expr | list[polars.Expr] | None = None) tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame]
Parameters:

additional_filters (polars.Expr | list[polars.Expr] | None)

Return type:

tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame]

get_decisions_without_actions_data(additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Per-stage count of interactions newly left with no remaining actions.

Returns a DataFrame with columns [self.level, “decisions_without_actions”], sorted in pipeline order. For each stage X, the value is the number of interactions that lose their final remaining action at stage X.

Parameters:

additional_filters (polars.Expr | list[polars.Expr] | None)

Return type:

polars.DataFrame

get_funnel_summary(available_df: polars.LazyFrame, passing_df: polars.DataFrame, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Per-stage summary: Available, Passing, Filtered actions and Decisions.

The table matches the funnel chart: it starts with a synthetic “Available Actions” row and excludes the Output stage.

Parameters:
  • available_df (pl.LazyFrame) – First element returned by get_funnel_data (actions entering each stage).

  • passing_df (pl.DataFrame) – Second element returned by get_funnel_data (actions exiting each stage).

  • additional_filters (optional) – Same filters used when calling get_funnel_data.

Returns:

One row per stage in pipeline order with raw counts first, then per-decision averages.

Return type:

pl.DataFrame

get_filter_component_data(top_n: int, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Top-N filter components per stage, ranked by filtered-decision count.

Parameters:
  • top_n (int) – Maximum number of components to return per stage.

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied before aggregation.

Returns:

Columns include the stage level, Component Name, and Filtered Decisions.

Return type:

pl.DataFrame

get_component_action_impact(top_n: int = 10, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Per-component breakdown of which items are filtered and how many.

For each component, returns the top-N items (at the chosen scope granularity) it filters out. The scope controls whether the breakdown is at Issue, Group, or Action level.

Parameters:
  • top_n (int, default 10) – Maximum number of items to return per component.

  • scope (str, default "Action") – Granularity level: "Issue", "Group", or "Action".

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters to apply before aggregation.

Returns:

Columns include pxComponentName, StageGroup, scope columns, and Filtered Decisions. Sorted by component then descending count.

Return type:

pl.DataFrame

get_component_drilldown(component_name: str, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None, sort_by: str = 'Filtered Decisions') polars.DataFrame

Deep-dive into a single filter component showing dropped actions and their potential value.

Since scoring columns (Priority, Value, Propensity) are typically null on FILTERED_OUT rows, this method derives the action’s “potential value” by looking up average scores from rows where the same action survives (non-null Priority/Value). This gives the “value of what’s being dropped” perspective.

Parameters:
  • component_name (str) – The pxComponentName to drill into.

  • scope (str, default "Action") – Granularity level: "Issue", "Group", or "Action".

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters to apply before aggregation.

  • sort_by (str, default "Filtered Decisions") – Column to sort results by (descending).

Returns:

Columns include scope columns, Filtered Decisions, avg_Priority, avg_Value, avg_Propensity, Component Type (if available).

Return type:

pl.DataFrame

re_rank(additional_filters: polars.Expr | list[polars.Expr] | None = None, overrides: list[polars.Expr] = []) polars.LazyFrame

Recalculate priority and rank for all PVCL component combinations.

Computes five alternative priority scores by selectively dropping one component at a time (Propensity, Value, Context Weight, Levers) and ranks actions within each interaction for each variant. This is the foundation for sensitivity analysis.

Parameters:
  • additional_filters (pl.Expr or list of pl.Expr, optional) – Filters applied to the sample before ranking.

  • overrides (list of pl.Expr, optional) – Column override expressions applied before priority calculation (e.g. to simulate lever adjustments).

Returns:

Sample data augmented with prio_* and rank_* columns for each PVCL variant.

Return type:

pl.LazyFrame

get_selected_group_rank_boundaries(group_filter: polars.Expr | list[polars.Expr], additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Compute selected-group rank boundaries per interaction.

For each interaction where the selected comparison group is present, returns the best (lowest) and worst (highest) rank observed for the selected rows in arbitration-relevant stages.

Parameters:
  • group_filter (polars.Expr | list[polars.Expr])

  • additional_filters (polars.Expr | list[polars.Expr] | None)

Return type:

polars.LazyFrame

get_win_loss_distribution_data(level: str | list[str], win_rank: int | None = None, additional_filters: polars.Expr | list[polars.Expr] | None = None, group_filter: polars.Expr | list[polars.Expr] | None = None, status: Literal['Wins', 'Losses'] | None = None, top_k: int | None = None) polars.LazyFrame

Win/loss distribution at a given scope level.

Operates in two modes depending on whether group_filter is provided:

  • Without group_filter (rank-based): uses pre-aggregated data and a fixed win_rank threshold to split wins/losses.

  • With group_filter (group-based): uses sample data and per-interaction rank boundaries of the selected group to identify actions it beats (Wins) or loses to (Losses).

Parameters:
  • level (str or list of str) – Column(s) to group the distribution by (e.g. "Action").

  • win_rank (int, optional) – Fixed rank threshold (required when group_filter is None).

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied before aggregation.

  • group_filter (pl.Expr or list of pl.Expr, optional) – Filter defining the comparison group.

  • status ({"Wins", "Losses"}, optional) – Required when group_filter is provided.

  • top_k (int, optional) – Limit the number of rows returned (group_filter mode only).

Return type:

pl.LazyFrame

_winning_from(interactions: polars.LazyFrame, win_rank: int, groupby_cols: list[str], top_k: int = 20, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Actions beaten by the comparison group in its winning interactions.

Parameters:
  • interactions (polars.LazyFrame) – Interaction IDs where the comparison group wins.

  • win_rank (int) – Rank threshold used to define “winning”.

  • groupby_cols (list[str]) – Columns to group by (e.g. ["Issue", "Group", "Action"]).

  • top_k (int) – Return only the top-k entries.

  • additional_filters (polars.Expr | list[polars.Expr] | None) – Optional extra filters (e.g. channel filter).

Return type:

polars.LazyFrame

_losing_to(interactions: polars.LazyFrame, win_rank: int, groupby_cols: list[str], top_k: int = 20, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Actions that beat the comparison group in its losing interactions.

Parameters:
  • interactions (polars.LazyFrame) – Interaction IDs where the comparison group loses.

  • win_rank (int) – Rank threshold used to define “winning”.

  • groupby_cols (list[str]) – Columns to group by (e.g. ["Issue", "Group", "Action"]).

  • top_k (int) – Return only the top-k entries.

  • additional_filters (polars.Expr | list[polars.Expr] | None) – Optional extra filters (e.g. channel filter).

Return type:

polars.LazyFrame

get_optionality_data(df=None, by_day: bool = False) polars.LazyFrame

Average number of actions per stage, optionally broken down by day.

Computes per-interaction action counts at each stage using aggregate_remaining_per_stage, then aggregates into a histogram.

Parameters:
  • df (pl.LazyFrame, optional) – Input data. Defaults to sample.

  • by_day (bool, default False) – If True, include "day" in the grouping for trend analysis. When False, zero-action rows are injected for stages where some interactions have no remaining actions.

Return type:

pl.LazyFrame

get_optionality_funnel(df=None) polars.LazyFrame

Optionality funnel: interaction counts bucketed by available-action count.

Buckets action counts into 0–6 and 7+, then counts interactions per stage and bucket. Used by the optionality funnel chart.

Parameters:

df (pl.LazyFrame, optional) – Input data. Defaults to sample.

Return type:

pl.LazyFrame

get_action_variation_data(stage, color_by=None)

Get action variation data, optionally broken down by a categorical dimension.

Args:

stage: The stage to analyze color_by: Optional categorical column to break down the variation by.

Can use “Channel/Direction” to combine Channel and Direction columns.

get_ab_test_results() polars.DataFrame

A/B test summary: control vs test counts and control percentage per stage.

Returns:

One row per stage with columns for Control, Test counts and Control Percentage.

Return type:

pl.DataFrame

get_thresholding_data(fld: str, quantile_range=range(10, 100, 10)) polars.DataFrame

Quantile-based thresholding analysis at Arbitration.

Computes counts and threshold values at each quantile for the given field (fld). Results are cached per (fld, quantile_range).

Parameters:
  • fld (str) – Column name to compute quantiles for (e.g. "Propensity").

  • quantile_range (range, default range(10, 100, 10)) – Percentile breakpoints to compute.

Returns:

Long-format table with columns Decile, Count, Threshold, and the stage-level column.

Return type:

pl.DataFrame

priority_component_distribution(component, granularity, stage=None, additional_filters=None)

Data for a single component’s distribution, grouped by granularity.

Parameters:
  • component (str) – Column name of the component to analyze.

  • granularity (str) – Column to group by (e.g. “Issue”, “Group”, “Action”).

  • stage (str, optional) – Filter to actions remaining at this stage. If None, uses all rows with non-null Priority.

  • additional_filters (pl.Expr or list[pl.Expr], optional) – Extra filters applied to the sample (e.g. channel filter).

all_components_distribution(granularity, stage=None, additional_filters=None)

Data for the overview panel: all prioritization components at once.

Parameters:
  • granularity (str) – Column to group by.

  • stage (str, optional) – Filter to actions remaining at this stage.

  • additional_filters (pl.Expr or list[pl.Expr], optional) – Extra filters applied to the sample (e.g. channel filter).

_remaining_at_stage(stage=None, additional_filters=None)

Return sample rows remaining at stage.

Uses the aggregate_remaining_per_stage logic: rows whose stage order is >= the selected stage are “remaining” there. If stage is None, falls back to rows with non-null Priority.

aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: list[str], aggregations: list = []) polars.LazyFrame

Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented.

Parameters:
  • df (polars.LazyFrame)

  • group_by_columns (list[str])

  • aggregations (list)

Return type:

polars.LazyFrame

filtered_action_counts(groupby_cols: list, propensityTH: float | None = None, priorityTH: float | None = None, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Return action counts from the sample, optionally classified by propensity/priority thresholds.

Parameters:
  • groupby_cols (list) – Column names to group by.

  • propensityTH (float, optional) – Propensity threshold for classifying offers.

  • priorityTH (float, optional) – Priority threshold for classifying offers.

  • additional_filters (pl.Expr or list[pl.Expr], optional) – Extra filters applied to the sample (e.g. channel filter).

Returns:

Aggregated action counts per group, with quality buckets when both thresholds are provided.

Return type:

pl.LazyFrame

get_offer_quality(df: polars.LazyFrame, group_by: str | list[str]) polars.LazyFrame

Cumulative offer-quality breakdown across stages.

Takes a filtered-action-counts frame (from filtered_action_counts()) and converts it to a remaining-per-stage view, joining in customers that have zero actions so they are counted as well.

Parameters:
  • df (pl.LazyFrame) – Filtered action counts with columns no_of_offers, new_models, poor_propensity_offers, etc.

  • group_by (str or list of str) – Columns to group by (e.g. ["Interaction ID"]).

Returns:

Per-stage quality classification with boolean flag columns (has_no_offers, atleast_one_relevant_action, etc.).

Return type:

pl.LazyFrame

property overview_stats: dict[str, object]

Creates an overview from the full (filtered) dataset.

Aggregate metrics (Decisions, Customers, Actions, Channels, Duration) are computed over decision_data so they reflect the true counts. Only the average-offers-per-stage KPI uses the sample (it requires interaction-level optionality analysis that would be too expensive on the full data).

Return type:

dict[str, object]

get_sensitivity(win_rank=1, group_filter=None, additional_filters=None)

Global or local sensitivity of the prioritization factors.

Parameters:
  • win_rank (int) – Maximum rank to be considered a winner.

  • group_filter (pl.Expr, optional) – Selected offers, only used in local sensitivity analysis. When None (global), results are cached by win_rank.

  • additional_filters (pl.Expr or list[pl.Expr], optional) – Extra filters applied to the sample before re-ranking (e.g. channel filter). When set, caching is bypassed.

Return type:

pl.LazyFrame

get_offer_variability_stats(stage: str) dict[str, float]

Summary statistics for action variation at a stage.

Parameters:

stage (str) – Stage to analyse.

Returns:

n90 — number of actions covering 90 % of decisions. gini — Gini coefficient of decision concentration.

Return type:

dict

get_winning_or_losing_interactions(group_filter: polars.Expr | list[polars.Expr], win: bool, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Interaction IDs where the comparison group wins or loses.

Parameters:
  • group_filter (pl.Expr or list of pl.Expr) – Filter defining the comparison group.

  • win (bool) – If True, return interactions where the group wins (there are lower-ranked actions outside the group). If False, return interactions where the group loses (there are higher-ranked actions outside the group).

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters (e.g. channel filter).

Returns:

Single-column frame of unique Interaction ID values.

Return type:

pl.LazyFrame

get_win_loss_counts(group_filter: polars.Expr | list[polars.Expr], win_rank: int = 1, additional_filters: polars.Expr | list[polars.Expr] | None = None) dict[str, int]

Count wins and losses for the comparison group at a given rank threshold.

A win is an interaction where at least one member of the comparison group achieves a rank of win_rank or better (lower). A loss is any interaction where the group participates but none rank that high.

Parameters:
  • group_filter (polars.Expr | list[polars.Expr]) – Filter expression(s) defining the comparison group.

  • win_rank (int) – Rank threshold. The group “wins” when its best rank <= win_rank.

  • additional_filters (polars.Expr | list[polars.Expr] | None) – Optional extra filters (e.g. channel/direction).

Return type:

dict with keys "wins", "losses", "total".

get_win_loss_distributions(interactions_win: polars.LazyFrame, interactions_loss: polars.LazyFrame, groupby_cols: list[str], top_k: int, additional_filters: polars.Expr | list[polars.Expr] | None = None, group_filter: polars.Expr | list[polars.Expr] | None = None, win_rank: int | None = None) tuple[polars.LazyFrame, polars.LazyFrame]

Distribution of actions the comparison group wins from and loses to.

Computes two aggregated distributions in a single pass over the stage-filtered data:

  • winning_from: actions ranked below the comparison group (i.e. actions it beats).

  • losing_to: actions ranked above the comparison group (i.e. actions that beat it).

Either group_filter or win_rank must be provided to define the rank boundary. When group_filter is given the boundary is the per-interaction best/worst rank of the selected group. When only win_rank is given a fixed rank threshold is used instead.

Parameters:
  • interactions_win (polars.LazyFrame) – Interaction IDs where the comparison group wins (from get_winning_or_losing_interactions(win=True)).

  • interactions_loss (polars.LazyFrame) – Interaction IDs where the comparison group loses (from get_winning_or_losing_interactions(win=False)).

  • groupby_cols (list[str]) – Columns to group by (e.g. ["Action"]).

  • top_k (int) – Return only the top-k entries per distribution.

  • additional_filters (polars.Expr | list[polars.Expr] | None) – Optional extra filters (e.g. channel/direction).

  • group_filter (polars.Expr | list[polars.Expr] | None) – Filter expression(s) defining the comparison group.

  • win_rank (int | None) – Fixed rank threshold. Required when group_filter is None.

Returns:

  • tuple of (winning_from, losing_to), both pl.LazyFrame with

  • columns from groupby_cols plus Decisions.

Return type:

tuple[polars.LazyFrame, polars.LazyFrame]

get_win_distribution_data(lever_condition: polars.Expr, lever_value: float | None = None, all_interactions: int | None = None) polars.DataFrame

Calculate win distribution data for business lever analysis.

This method generates distribution data showing how actions perform in arbitration decisions, both in baseline conditions and optionally with lever adjustments applied.

Parameters:
  • lever_condition (pl.Expr) –

    Polars expression defining which actions to apply the lever to. Example: pl.col(“Action”) == “SpecificAction” or

    (pl.col(“Issue”) == “Service”) & (pl.col(“Group”) == “Cards”)

  • lever_value (float, optional) – The lever multiplier value to apply to selected actions. If None, returns baseline distribution only. If provided, returns both original and lever-adjusted win counts.

  • all_interactions (int, optional) – Total number of interactions to calculate “no winner” count. If provided, enables calculation of interactions without any winner. If None, “no winner” data is not calculated.

Returns:

DataFrame containing win distribution with columns: - pyIssue, pyGroup, pyName: Action identifiers - original_win_count: Number of rank-1 wins in baseline scenario - new_win_count: Number of rank-1 wins after lever adjustment (only if lever_value provided) - n_decisions_survived_to_arbitration: Number of arbitration decisions the action participated in - selected_action: “Selected” for actions matching lever_condition, “Rest” for others - no_winner_count: Number of interactions without any winner (only if all_interactions provided)

Return type:

pl.DataFrame

Notes

  • Only includes actions that survive to arbitration stage

  • Win counts represent rank-1 (first place) finishes in arbitration decisions

  • This is a zero-sum analysis: boosting selected actions suppresses others

  • Results are sorted by win count (new_win_count if available, else original_win_count)

  • When all_interactions is provided, “no winner” represents interactions without any rank-1 winner

Examples

Get baseline distribution for a specific action: >>> lever_cond = pl.col(“Action”) == “MyAction” >>> baseline = decision_analyzer.get_win_distribution_data(lever_cond)

Get distribution with 2x lever applied to service actions: >>> lever_cond = pl.col(“Issue”) == “Service” >>> with_lever = decision_analyzer.get_win_distribution_data(lever_cond, 2.0)

Get distribution with no winner count: >>> total_interactions = 10000 >>> with_no_winner = decision_analyzer.get_win_distribution_data(lever_cond, 2.0, total_interactions)

get_trend_data(stage: str = 'AvailableActions', scope: Literal['Group', 'Issue', 'Action'] | None = 'Group', additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Daily trend of unique decisions from a given stage onward.

Parameters:
  • stage (str, default "AvailableActions") – Starting stage; all stages from this point onward are included.

  • scope ({"Group", "Issue", "Action"} or None, default "Group") – Optional grouping dimension. If None, returns totals by day.

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied to the sample.

Returns:

Columns: day, optionally scope, and Decisions.

Return type:

pl.DataFrame

find_lever_value(lever_condition: polars.Expr, target_win_percentage: float, win_rank: int = 1, low: float = 0, high: float = 100, precision: float = 0.01, ranking_stages: list[str] | None = None) float

Binary search algorithm to find lever value needed to achieve a desired win percentage.

Parameters:
  • lever_condition (pl.Expr) – Polars expression that defines which actions should receive the lever

  • target_win_percentage (float) – The desired win percentage (0-100)

  • win_rank (int, default 1) – Consider actions winning if they rank <= this value

  • low (float, default 0) – Lower bound for lever search range

  • high (float, default 100) – Upper bound for lever search range

  • precision (float, default 0.01) – Search precision - smaller values give more accurate results

  • ranking_stages (list[str], optional) – List of stages to include in analysis. Defaults to [“Arbitration”]

Returns:

The lever value needed to achieve the target win percentage

Return type:

float

Raises:

ValueError – If the target win percentage cannot be achieved within the search range