pdstools.decision_analyzer

Submodules

Classes

DecisionAnalyzer

Analyze NBA decision data from Explainability Extract or Decision Analyzer exports.

Package Contents

class DecisionAnalyzer(raw_data: polars.LazyFrame, *, level: str = 'Stage Group', sample_size: int = DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None, num_samples: int = 1)

Analyze NBA decision data from Explainability Extract or Decision Analyzer exports.

This class processes raw decision data to create a comprehensive analysis framework for NBA (Next-Best-Action). It supports two data source formats:

  • Explainability Extract (v1): Simpler format with actions at the arbitration stage. Stages are synthetically derived from ranking.

  • Decision Analyzer / EEV2 (v2): Full pipeline data with real stage information, filter component names, and detailed strategy tracking.

Data can be loaded via class methods or directly:

Parameters:
  • raw_data (polars.LazyFrame)

  • level (str)

  • sample_size (int)

  • mandatory_expr (polars.Expr | None)

  • additional_columns (dict[str, polars.DataType] | None)

  • num_samples (int)

decision_data

Interaction-level decision data (with global filters applied if any).

Type:

pl.LazyFrame

extract_type

Either "explainability_extract" or "decision_analyzer".

Type:

str

plot

Plot accessor for visualization methods.

Type:

Plot

aggregates

Accessor for aggregation queries (funnel, distribution, optionality, action variation, …).

Type:

Aggregates

scoring

Accessor for re-ranking, sensitivity, win/loss and lever analysis.

Type:

Scoring

Examples

>>> from pdstools import DecisionAnalyzer
>>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet")
>>> da.overview_stats
>>> da.plot.sensitivity()
>>> da.aggregates.get_funnel_data(scope="Action")
>>> da.scoring.get_sensitivity()
_num_sample_interactions: int
classmethod from_explainability_extract(source: str | os.PathLike, *, level: str = 'Stage Group', sample_size: int = DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None, num_samples: int = 1) DecisionAnalyzer

Create a DecisionAnalyzer from an Explainability Extract (v1) file.

Parameters:
  • source (str | os.PathLike) – Path to the Explainability Extract parquet file, or a URL.

  • level (str) – See __init__() for details.

  • sample_size (int) – See __init__() for details.

  • mandatory_expr (polars.Expr | None) – See __init__() for details.

  • additional_columns (dict[str, polars.DataType] | None) – See __init__() for details.

  • num_samples (int) – See __init__() for details.

Return type:

DecisionAnalyzer

Examples

>>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet")
classmethod from_decision_analyzer(source: str | os.PathLike, *, level: str = 'Stage Group', sample_size: int = DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None, num_samples: int = 1) DecisionAnalyzer

Create a DecisionAnalyzer from a Decision Analyzer / EEV2 (v2) file.

Parameters:
  • source (str | os.PathLike) – Path to the Decision Analyzer parquet file, or a URL.

  • level (str) – See __init__() for details.

  • sample_size (int) – See __init__() for details.

  • mandatory_expr (polars.Expr | None) – See __init__() for details.

  • additional_columns (dict[str, polars.DataType] | None) – See __init__() for details.

  • num_samples (int) – See __init__() for details.

Return type:

DecisionAnalyzer

Examples

>>> da = DecisionAnalyzer.from_decision_analyzer("data/sample_eev2.parquet")
_default_filter_fields = ['Decision Time', 'Channel', 'Direction', 'Issue', 'Group', 'Action', 'Treatment', 'Stage',...
plot
aggregates
scoring
level = 'Stage Group'
sample_size = 10000
_num_samples = 1
_thresholding_cache: dict[tuple[str, tuple[int, Ellipsis]], polars.DataFrame]
_sensitivity_cache: dict[int, polars.LazyFrame]
extract_type = 'decision_analyzer'
validation_error = 'The following default columns are missing: '
decision_data
fields_for_data_filtering
preaggregation_columns
max_win_rank = 5
AvailableNBADStages = ['Arbitration', 'Output']
property available_levels: list[str]

Stage granularity levels available for this dataset.

Returns ["Stage Group", "Stage"] for Decision Analyzer (v2) data when both columns are present, or ["Stage Group"] for Explainability Extract (v1) data where only synthetic stages exist.

Return type:

list[str]

set_level(level: str)

Switch the stage granularity level used for all analyses.

Recomputes the available stages for the new level and invalidates all cached properties so subsequent queries use the new granularity.

Parameters:

level (str) – "Stage Group" or "Stage".

_recompute_available_stages()

Derive AvailableNBADStages from the data for the current level.

At “Stage Group” level, synthetically injects “Arbitration” if it has no data rows (it is used as an anchor point by many analyses).

At “Stage” level, detects Stage Groups that have no individual stages represented in the data and inserts the group name as a placeholder so the full pipeline is visible.

property mandatory_actions: set[str]

Set of action names flagged as mandatory in the current data.

Mandatory actions bypass normal arbitration and always rank in the top slot. Auto-detected from Priority (see MANDATORY_PRIORITY_THRESHOLD) unless an explicit mandatory_expr was supplied at construction.

Returns:

Distinct Action values where is_mandatory is truthy. Empty when no mandatory rows or no Action / is_mandatory column is available.

Return type:

set[str]

property color_mappings: dict[str, dict[str, str]]

Compute consistent color mappings for all categorical dimensions.

Color assignments are based on all unique values in the full dataset (before sampling), sorted alphabetically. This ensures colors remain consistent throughout the session regardless of filtering.

Returns:

Nested dictionary mapping dimension names to color dictionaries. Example: {

”Issue”: {“Retention”: “#001F5F”, “Sales”: “#10A5AC”}, “Group”: {“CreditCards”: “#001F5F”, “Loans”: “#10A5AC”},

}

Return type:

dict[str, dict[str, str]]

Notes

Uses @cached_property so computation happens once on first access. Colors are assigned from the Pega colorway using modulo indexing.

See also

pdstools.utils.color_mapping.create_categorical_color_mappings

Generic utility for creating color mappings in any Streamlit app.

property stages_from_arbitration_down

All stages from Arbitration onward, respecting the current level.

At “Stage Group” level this slices from the literal “Arbitration” entry. At “Stage” level it finds stages whose Stage Order is >= the Arbitration group order (3800) using the stage_to_group_mapping.

property stages_with_propensity

Infer which stages have meaningful propensity scores from the data.

Examines the sample data to determine which stages have non-null, non-default propensity values. Returns stages where propensity-based classification makes sense.

property propensity_validation_warning: str | None

Validate propensity values and return warning message if issues detected.

Checks for: 1. Invalid propensities (> 1.0) - mathematically impossible for probabilities 2. Unusually high propensities (> 0.1) - uncommon for typical marketing interactions

Returns None if validation passes or propensity data is not available. Uses sample data for efficiency.

Return type:

str | None

property arbitration_stage: polars.LazyFrame

Sample rows remaining at or after the Arbitration stage.

Return type:

polars.LazyFrame

property num_sample_interactions: int

Number of unique interactions in the sample. Automatically triggers sampling if not yet calculated.

Return type:

int

_invalidate_cached_properties()

Resets the properties of the class. Needed for global filters.

property preaggregated_filter_view

Pre-aggregates the full dataset over customers and interactions providing a view of what is filtered at a stage.

This pre-aggregation is pretty similar to what “VBD” does to interaction history. It aggregates over individual customers and interactions giving summary statistics that are sufficient to drive most of the analyses (but not all). The results of this pre-aggregation are much smaller than the original data and is expected to easily fit in memory. We therefore use polars caching to efficiently cache this.

This “filter” view keeps the same organization as the decision analyzer data in that it records the actions that get filtered out at stages. From this a “remaining” view is easily derived.

property preaggregated_remaining_view

Pre-aggregates the full dataset over customers and interactions providing a view of remaining offers.

This pre-aggregation builds on the filter view and aggregates over the stages remaining.

property sample

Hash-based deterministic sample of interactions for resource-intensive analyses.

Selects up to sample_size unique interactions using a hash of Interaction ID. All actions within a selected interaction are kept. If fewer interactions exist than sample_size, no sampling is performed.

When the --sample CLI flag is active, this operates on the already-reduced dataset, so two layers of sampling may apply.

filtered(filters: list[polars.Expr] | polars.Expr | None = None) polars.LazyFrame

Return self.sample with the given filter expressions applied.

Parameters:

filters (list[pl.Expr] | pl.Expr | None, default None) – Filter expressions to AND together. None or an empty list returns the sample unchanged. Apps should construct this list from their own state (for Streamlit pages, see pdstools.app.decision_analyzer.da_streamlit_utils.collect_page_filters()); the library deliberately does not read UI state.

Returns:

The (possibly filtered) sample.

Return type:

pl.LazyFrame

get_available_fields_for_filtering(*, categorical_only: bool = False) list[str]

Return column names available for data filtering.

Parameters:

categorical_only (bool, default False) – If True, return only string/categorical columns.

Return type:

list[str]

_cleanup_raw_data(df: polars.LazyFrame) polars.LazyFrame

This method cleans up the raw data we read from parquet/S3/whatever.

This likely needs to change as and when we get closer to product, to match what comes out of Pega. It does some modest type casting and potentially changing back some of the temporary column names that have been added to generate more data.

Parameters:

df (polars.LazyFrame)

Return type:

polars.LazyFrame

get_possible_scope_values() list[str]

Return scope hierarchy columns present in the data (e.g. Issue, Group, Action).

Return type:

list[str]

get_possible_stage_values() list[str]

Return the list of available stage values for the current level.

Return type:

list[str]

property stage_to_group_mapping: dict[str, str]

Map each Stage name to its Stage Group.

Only meaningful when level == "Stage" and both columns exist. Returns an empty dict otherwise (including v1 / explainability data).

Return type:

dict[str, str]

property overview_stats: dict[str, object]

Creates an overview from the full (filtered) dataset.

Aggregate metrics (Decisions, Customers, Actions, Channels, Duration) are computed over decision_data so they reflect the true counts. Only the average-offers-per-stage KPI uses the sample (it requires interaction-level optionality analysis that would be too expensive on the full data).

Return type:

dict[str, object]