pdstools.decision_analyzer¶
Submodules¶
Classes¶
Analyze NBA decision data from Explainability Extract or Decision Analyzer exports. |
Package Contents¶
- class DecisionAnalyzer(raw_data: polars.LazyFrame, level='Stage Group', sample_size=DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None)¶
Analyze NBA decision data from Explainability Extract or Decision Analyzer exports.
This class processes raw decision data to create a comprehensive analysis framework for NBA (Next-Best-Action). It supports two data source formats:
Explainability Extract (v1): Simpler format with actions at the arbitration stage. Stages are synthetically derived from ranking.
Decision Analyzer / EEV2 (v2): Full pipeline data with real stage information, filter component names, and detailed strategy tracking.
Data can be loaded via class methods or directly:
from_explainability_extract(): Load from an Explainability Extract file.from_decision_analyzer(): Load from a Decision Analyzer (EEV2) file.Direct
__init__: Auto-detects format from the data schema.
- Parameters:
- decision_data¶
Interaction-level decision data (with global filters applied if any).
- Type:
pl.LazyFrame
Examples
>>> from pdstools import DecisionAnalyzer >>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet") >>> da.get_overview_stats >>> da.plot.sensitivity()
- classmethod from_explainability_extract(source: str | os.PathLike, **kwargs) DecisionAnalyzer¶
Create a DecisionAnalyzer from an Explainability Extract (v1) file.
- Parameters:
source (str | os.PathLike) – Path to the Explainability Extract parquet file, or a URL.
**kwargs – Additional keyword arguments passed to
__init__(e.g.sample_size,mandatory_expr,additional_columns).
- Return type:
Examples
>>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet")
- classmethod from_decision_analyzer(source: str | os.PathLike, **kwargs) DecisionAnalyzer¶
Create a DecisionAnalyzer from a Decision Analyzer / EEV2 (v2) file.
- Parameters:
source (str | os.PathLike) – Path to the Decision Analyzer parquet file, or a URL.
**kwargs – Additional keyword arguments passed to
__init__(e.g.sample_size,mandatory_expr,additional_columns).
- Return type:
Examples
>>> da = DecisionAnalyzer.from_decision_analyzer("data/sample_eev2.parquet")
- _default_filter_fields = ['Decision Time', 'Channel', 'Direction', 'Issue', 'Group', 'Action', 'Treatment', 'Stage',...¶
- plot¶
- level = 'Stage Group'¶
- sample_size = 50000¶
- extract_type = 'decision_analyzer'¶
- validation_error = 'The following default columns are missing: '¶
- unfiltered_raw_decision_data¶
- fields_for_data_filtering¶
- preaggregation_columns¶
- max_win_rank = 5¶
- AvailableNBADStages = ['Arbitration', 'Output']¶
- property available_levels: list[str]¶
Stage granularity levels available for this dataset.
Returns
["Stage Group", "Stage"]for Decision Analyzer (v2) data when both columns are present, or["Stage Group"]for Explainability Extract (v1) data where only synthetic stages exist.
- set_level(level: str)¶
Switch the stage granularity level used for all analyses.
Recomputes the available stages for the new level and invalidates all cached properties so subsequent queries use the new granularity.
- Parameters:
level (str) –
"Stage Group"or"Stage".
- _recompute_available_stages()¶
Derive
AvailableNBADStagesfrom the data for the current level.
- property stages_from_arbitration_down¶
All stages from Arbitration onward, respecting the current level.
At “Stage Group” level this slices from the literal “Arbitration” entry. At “Stage” level it finds stages whose Stage Order is >= the Arbitration group order (3800) using the stage_to_group_mapping.
- property stages_with_propensity¶
Infer which stages have meaningful propensity scores from the data.
Examines the sample data to determine which stages have non-null, non-default propensity values. Returns stages where propensity-based classification makes sense.
- property propensity_validation_warning: str | None¶
Validate propensity values and return warning message if issues detected.
Checks for: 1. Invalid propensities (> 1.0) - mathematically impossible for probabilities 2. Unusually high propensities (> 0.1) - uncommon for typical marketing interactions
Returns None if validation passes or propensity data is not available. Uses sample data for efficiency.
- Return type:
str | None
- property arbitration_stage¶
- property num_sample_interactions: int¶
Number of unique interactions in the sample. Automatically triggers sampling if not yet calculated.
- Return type:
- _invalidate_cached_properties()¶
Resets the properties of the class. Needed for global filters.
- applyGlobalDataFilters(filters: polars.Expr | list[polars.Expr] | None = None)¶
Apply a global set of filters
- Parameters:
filters (polars.Expr | list[polars.Expr] | None)
- resetGlobalDataFilters()¶
- property getPreaggregatedFilterView¶
Pre-aggregates the full dataset over customers and interactions providing a view of what is filtered at a stage.
This pre-aggregation is pretty similar to what “VBD” does to interaction history. It aggregates over individual customers and interactions giving summary statistics that are sufficient to drive most of the analyses (but not all). The results of this pre-aggregation are much smaller than the original data and is expected to easily fit in memory. We therefore use polars caching to efficiently cache this.
This “filter” view keeps the same organization as the decision analyzer data in that it records the actions that get filtered out at stages. From this a “remaining” view is easily derived.
- property getPreaggregatedRemainingView¶
Pre-aggregates the full dataset over customers and interactions providing a view of remaining offers.
This pre-aggregation builds on the filter view and aggregates over the stages remaining.
- property sample¶
Hash-based deterministic sample of interactions for resource-intensive analyses.
Selects up to
sample_sizeunique interactions using a hash of Interaction ID. All actions within a selected interaction are kept. If fewer interactions exist thansample_size, no sampling is performed.When the
--sampleCLI flag is active, this operates on the already-reduced dataset, so two layers of sampling may apply.
- getAvailableFieldsForFiltering(categoricalOnly=False)¶
- cleanup_raw_data(df: polars.LazyFrame)¶
This method cleans up the raw data we read from parquet/S3/whatever.
This likely needs to change as and when we get closer to product, to match what comes out of Pega. It does some modest type casting and potentially changing back some of the temporary column names that have been added to generate more data.
- Parameters:
df (polars.LazyFrame)
- getPossibleScopeValues()¶
- getPossibleStageValues()¶
- property stage_to_group_mapping: dict[str, str]¶
Map each Stage name to its Stage Group.
Only meaningful when
level == "Stage"and both columns exist. Returns an empty dict otherwise (including v1 / explainability data).
- getDistributionData(stage: str, grouping_levels: str | list[str], additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame¶
- getFunnelData(scope, additional_filters: polars.Expr | list[polars.Expr] | None = None) tuple[polars.LazyFrame, polars.DataFrame]¶
- getFilterComponentData(top_n, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
- Parameters:
additional_filters (polars.Expr | list[polars.Expr] | None)
- Return type:
polars.DataFrame
- getComponentActionImpact(top_n: int = 10, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
Per-component breakdown of which items are filtered and how many.
For each component, returns the top-N items (at the chosen scope granularity) it filters out. The scope controls whether the breakdown is at Issue, Group, or Action level.
- Parameters:
- Returns:
Columns include pxComponentName, StageGroup, scope columns, and Filtered Decisions. Sorted by component then descending count.
- Return type:
pl.DataFrame
- getComponentDrilldown(component_name: str, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
Deep-dive into a single filter component showing dropped actions and their potential value.
Since scoring columns (Priority, Value, Propensity) are typically null on FILTERED_OUT rows, this method derives the action’s “potential value” by looking up average scores from rows where the same action survives (non-null Priority/Value). This gives the “value of what’s being dropped” perspective.
- Parameters:
- Returns:
Columns: pyIssue, pyGroup, pyName, Filtered Decisions, avg_Priority, avg_Value, avg_Propensity, pxComponentType (if available). Sorted by Filtered Decisions descending.
- Return type:
pl.DataFrame
- reRank(additional_filters: polars.Expr | list[polars.Expr] | None = None, overrides: list[polars.Expr] = []) polars.LazyFrame¶
Calculates prio and rank for all PVCL combinations
- get_win_loss_distribution_data(level, win_rank)¶
- get_optionality_data(df)¶
Finding the average number of actions per stage without trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately.
- get_optionality_data_with_trend(df=None)¶
Finding the average number of actions per stage with trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately.
- get_optionality_funnel(df=None)¶
- getActionVariationData(stage)¶
- getABTestResults()¶
- getThresholdingData(fld, quantile_range=range(10, 100, 10))¶
- priority_component_distribution(component, granularity, stage=None)¶
Data for a single component’s distribution, grouped by granularity.
- all_components_distribution(granularity, stage=None)¶
Data for the overview panel: all prioritization components at once.
- _remaining_at_stage(stage=None)¶
Return sample rows remaining at stage.
Uses the
aggregate_remaining_per_stagelogic: rows whose stage order is >= the selected stage are “remaining” there. If stage is None, falls back to rows with non-null Priority.
- aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: list[str], aggregations: list = []) polars.LazyFrame¶
Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented.
- filtered_action_counts(groupby_cols: list, propensityTH: float | None = None, priorityTH: float | None = None) polars.LazyFrame¶
Return action counts from the sample, optionally classified by propensity/priority thresholds.
- Parameters:
- Returns:
Aggregated action counts per group, with quality buckets when both thresholds are provided.
- Return type:
pl.LazyFrame
- get_offer_quality(df, group_by)¶
Given a dataframe with filtered action counts at stages. Flips it to usual VF view by doing a rolling sum over stages.
- Parameters:
df (pl.LazyFrame) – Decision Analyzer style filtered action counts dataframe.
groupby_cols (list) – The list of column names to group by([self.level, “Interaction ID”]).
- Returns:
Value Finder style, available action counts per group_by category
- Return type:
pl.LazyFrame
- property get_overview_stats¶
Creates an overview from sampled data
- get_sensitivity(win_rank=1, filters=None)¶
Global or local sensitivity of the prioritization factors.
- Parameters:
win_rank (int) – Maximum rank to be considered a winner.
filters (pl.Expr, optional) – Selected offers, only used in local sensitivity analysis. When
None(global), results are cached bywin_rank.
- Return type:
pl.LazyFrame
- get_offer_variability_stats(stage)¶
- winning_from(interactions, win_rank, groupby_cols, top_k)¶
- losing_to(interactions, win_rank, groupby_cols, top_k)¶
- get_win_distribution_data(lever_condition: polars.Expr, lever_value: float | None = None, all_interactions: int | None = None) polars.DataFrame¶
Calculate win distribution data for business lever analysis.
This method generates distribution data showing how actions perform in arbitration decisions, both in baseline conditions and optionally with lever adjustments applied.
- Parameters:
lever_condition (pl.Expr) –
Polars expression defining which actions to apply the lever to. Example: pl.col(“Action”) == “SpecificAction” or
(pl.col(“Issue”) == “Service”) & (pl.col(“Group”) == “Cards”)
lever_value (float, optional) – The lever multiplier value to apply to selected actions. If None, returns baseline distribution only. If provided, returns both original and lever-adjusted win counts.
all_interactions (int, optional) – Total number of interactions to calculate “no winner” count. If provided, enables calculation of interactions without any winner. If None, “no winner” data is not calculated.
- Returns:
DataFrame containing win distribution with columns: - pyIssue, pyGroup, pyName: Action identifiers - original_win_count: Number of rank-1 wins in baseline scenario - new_win_count: Number of rank-1 wins after lever adjustment (only if lever_value provided) - n_decisions_survived_to_arbitration: Number of arbitration decisions the action participated in - selected_action: “Selected” for actions matching lever_condition, “Rest” for others - no_winner_count: Number of interactions without any winner (only if all_interactions provided)
- Return type:
pl.DataFrame
Notes
Only includes actions that survive to arbitration stage
Win counts represent rank-1 (first place) finishes in arbitration decisions
This is a zero-sum analysis: boosting selected actions suppresses others
Results are sorted by win count (new_win_count if available, else original_win_count)
When all_interactions is provided, “no winner” represents interactions without any rank-1 winner
Examples
Get baseline distribution for a specific action: >>> lever_cond = pl.col(“Action”) == “MyAction” >>> baseline = decision_analyzer.get_win_distribution_data(lever_cond)
Get distribution with 2x lever applied to service actions: >>> lever_cond = pl.col(“Issue”) == “Service” >>> with_lever = decision_analyzer.get_win_distribution_data(lever_cond, 2.0)
Get distribution with no winner count: >>> total_interactions = 10000 >>> with_no_winner = decision_analyzer.get_win_distribution_data(lever_cond, 2.0, total_interactions)
- get_trend_data(stage: str = 'AvailableActions', scope: Literal['Group', 'Issue', 'Action'] | None = 'Group') polars.DataFrame¶
- Parameters:
stage (str)
scope (Literal['Group', 'Issue', 'Action'] | None)
- Return type:
polars.DataFrame
- find_lever_value(lever_condition: polars.Expr, target_win_percentage: float, win_rank: int = 1, low: float = 0, high: float = 100, precision: float = 0.01, ranking_stages: list[str] | None = None) float¶
Binary search algorithm to find lever value needed to achieve a desired win percentage.
- Parameters:
lever_condition (pl.Expr) – Polars expression that defines which actions should receive the lever
target_win_percentage (float) – The desired win percentage (0-100)
win_rank (int, default 1) – Consider actions winning if they rank <= this value
low (float, default 0) – Lower bound for lever search range
high (float, default 100) – Upper bound for lever search range
precision (float, default 0.01) – Search precision - smaller values give more accurate results
ranking_stages (list[str], optional) – List of stages to include in analysis. Defaults to [“Arbitration”]
- Returns:
The lever value needed to achieve the target win percentage
- Return type:
- Raises:
ValueError – If the target win percentage cannot be achieved within the search range