pdstools.decision_analyzer¶
Submodules¶
- pdstools.decision_analyzer.DecisionAnalyzer
- pdstools.decision_analyzer._aggregates
- pdstools.decision_analyzer._scoring
- pdstools.decision_analyzer.column_schema
- pdstools.decision_analyzer.data_read_utils
- pdstools.decision_analyzer.plots
- pdstools.decision_analyzer.stage_grouping
- pdstools.decision_analyzer.utils
Classes¶
Analyze NBA decision data from Explainability Extract or Decision Analyzer exports. |
Package Contents¶
- class DecisionAnalyzer(raw_data: polars.LazyFrame, *, level: str = 'Stage Group', sample_size: int = DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None, num_samples: int = 1)¶
Analyze NBA decision data from Explainability Extract or Decision Analyzer exports.
This class processes raw decision data to create a comprehensive analysis framework for NBA (Next-Best-Action). It supports two data source formats:
Explainability Extract (v1): Simpler format with actions at the arbitration stage. Stages are synthetically derived from ranking.
Decision Analyzer / EEV2 (v2): Full pipeline data with real stage information, filter component names, and detailed strategy tracking.
Data can be loaded via class methods or directly:
from_explainability_extract(): Load from an Explainability Extract file.from_decision_analyzer(): Load from a Decision Analyzer (EEV2) file.Direct
__init__: Auto-detects format from the data schema.
- Parameters:
- decision_data¶
Interaction-level decision data (with global filters applied if any).
- Type:
pl.LazyFrame
- aggregates¶
Accessor for aggregation queries (funnel, distribution, optionality, action variation, …).
- Type:
Examples
>>> from pdstools import DecisionAnalyzer >>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet") >>> da.overview_stats >>> da.plot.sensitivity() >>> da.aggregates.get_funnel_data(scope="Action") >>> da.scoring.get_sensitivity()
- classmethod from_explainability_extract(source: str | os.PathLike, *, level: str = 'Stage Group', sample_size: int = DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None, num_samples: int = 1) DecisionAnalyzer¶
Create a DecisionAnalyzer from an Explainability Extract (v1) file.
- Parameters:
source (str | os.PathLike) – Path to the Explainability Extract parquet file, or a URL.
level (str) – See
__init__()for details.sample_size (int) – See
__init__()for details.mandatory_expr (polars.Expr | None) – See
__init__()for details.additional_columns (dict[str, polars.DataType] | None) – See
__init__()for details.num_samples (int) – See
__init__()for details.
- Return type:
Examples
>>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet")
- classmethod from_decision_analyzer(source: str | os.PathLike, *, level: str = 'Stage Group', sample_size: int = DEFAULT_SAMPLE_SIZE, mandatory_expr: polars.Expr | None = None, additional_columns: dict[str, polars.DataType] | None = None, num_samples: int = 1) DecisionAnalyzer¶
Create a DecisionAnalyzer from a Decision Analyzer / EEV2 (v2) file.
- Parameters:
source (str | os.PathLike) – Path to the Decision Analyzer parquet file, or a URL.
level (str) – See
__init__()for details.sample_size (int) – See
__init__()for details.mandatory_expr (polars.Expr | None) – See
__init__()for details.additional_columns (dict[str, polars.DataType] | None) – See
__init__()for details.num_samples (int) – See
__init__()for details.
- Return type:
Examples
>>> da = DecisionAnalyzer.from_decision_analyzer("data/sample_eev2.parquet")
- _default_filter_fields = ['Decision Time', 'Channel', 'Direction', 'Issue', 'Group', 'Action', 'Treatment', 'Stage',...¶
- plot¶
- aggregates¶
- scoring¶
- level = 'Stage Group'¶
- sample_size = 10000¶
- _num_samples = 1¶
- extract_type = 'decision_analyzer'¶
- validation_error = 'The following default columns are missing: '¶
- decision_data¶
- fields_for_data_filtering¶
- preaggregation_columns¶
- max_win_rank = 5¶
- AvailableNBADStages = ['Arbitration', 'Output']¶
- property available_levels: list[str]¶
Stage granularity levels available for this dataset.
Returns
["Stage Group", "Stage"]for Decision Analyzer (v2) data when both columns are present, or["Stage Group"]for Explainability Extract (v1) data where only synthetic stages exist.
- set_level(level: str)¶
Switch the stage granularity level used for all analyses.
Recomputes the available stages for the new level and invalidates all cached properties so subsequent queries use the new granularity.
- Parameters:
level (str) –
"Stage Group"or"Stage".
- _recompute_available_stages()¶
Derive
AvailableNBADStagesfrom the data for the current level.At “Stage Group” level, synthetically injects “Arbitration” if it has no data rows (it is used as an anchor point by many analyses).
At “Stage” level, detects Stage Groups that have no individual stages represented in the data and inserts the group name as a placeholder so the full pipeline is visible.
- property mandatory_actions: set[str]¶
Set of action names flagged as mandatory in the current data.
Mandatory actions bypass normal arbitration and always rank in the top slot. Auto-detected from
Priority(seeMANDATORY_PRIORITY_THRESHOLD) unless an explicitmandatory_exprwas supplied at construction.
- property color_mappings: dict[str, dict[str, str]]¶
Compute consistent color mappings for all categorical dimensions.
Color assignments are based on all unique values in the full dataset (before sampling), sorted alphabetically. This ensures colors remain consistent throughout the session regardless of filtering.
- Returns:
Nested dictionary mapping dimension names to color dictionaries. Example: {
”Issue”: {“Retention”: “#001F5F”, “Sales”: “#10A5AC”}, “Group”: {“CreditCards”: “#001F5F”, “Loans”: “#10A5AC”},
}
- Return type:
Notes
Uses @cached_property so computation happens once on first access. Colors are assigned from the Pega colorway using modulo indexing.
See also
pdstools.utils.color_mapping.create_categorical_color_mappingsGeneric utility for creating color mappings in any Streamlit app.
- property stages_from_arbitration_down¶
All stages from Arbitration onward, respecting the current level.
At “Stage Group” level this slices from the literal “Arbitration” entry. At “Stage” level it finds stages whose Stage Order is >= the Arbitration group order (3800) using the stage_to_group_mapping.
- property stages_with_propensity¶
Infer which stages have meaningful propensity scores from the data.
Examines the sample data to determine which stages have non-null, non-default propensity values. Returns stages where propensity-based classification makes sense.
- property propensity_validation_warning: str | None¶
Validate propensity values and return warning message if issues detected.
Checks for: 1. Invalid propensities (> 1.0) - mathematically impossible for probabilities 2. Unusually high propensities (> 0.1) - uncommon for typical marketing interactions
Returns None if validation passes or propensity data is not available. Uses sample data for efficiency.
- Return type:
str | None
- property arbitration_stage: polars.LazyFrame¶
Sample rows remaining at or after the Arbitration stage.
- Return type:
polars.LazyFrame
- property num_sample_interactions: int¶
Number of unique interactions in the sample. Automatically triggers sampling if not yet calculated.
- Return type:
- _invalidate_cached_properties()¶
Resets the properties of the class. Needed for global filters.
- property preaggregated_filter_view¶
Pre-aggregates the full dataset over customers and interactions providing a view of what is filtered at a stage.
This pre-aggregation is pretty similar to what “VBD” does to interaction history. It aggregates over individual customers and interactions giving summary statistics that are sufficient to drive most of the analyses (but not all). The results of this pre-aggregation are much smaller than the original data and is expected to easily fit in memory. We therefore use polars caching to efficiently cache this.
This “filter” view keeps the same organization as the decision analyzer data in that it records the actions that get filtered out at stages. From this a “remaining” view is easily derived.
- property preaggregated_remaining_view¶
Pre-aggregates the full dataset over customers and interactions providing a view of remaining offers.
This pre-aggregation builds on the filter view and aggregates over the stages remaining.
- property sample¶
Hash-based deterministic sample of interactions for resource-intensive analyses.
Selects up to
sample_sizeunique interactions using a hash of Interaction ID. All actions within a selected interaction are kept. If fewer interactions exist thansample_size, no sampling is performed.When the
--sampleCLI flag is active, this operates on the already-reduced dataset, so two layers of sampling may apply.
- filtered(filters: list[polars.Expr] | polars.Expr | None = None) polars.LazyFrame¶
Return
self.samplewith the given filter expressions applied.- Parameters:
filters (list[pl.Expr] | pl.Expr | None, default None) – Filter expressions to AND together.
Noneor an empty list returns the sample unchanged. Apps should construct this list from their own state (for Streamlit pages, seepdstools.app.decision_analyzer.da_streamlit_utils.collect_page_filters()); the library deliberately does not read UI state.- Returns:
The (possibly filtered) sample.
- Return type:
pl.LazyFrame
- get_available_fields_for_filtering(*, categorical_only: bool = False) list[str]¶
Return column names available for data filtering.
- _cleanup_raw_data(df: polars.LazyFrame) polars.LazyFrame¶
This method cleans up the raw data we read from parquet/S3/whatever.
This likely needs to change as and when we get closer to product, to match what comes out of Pega. It does some modest type casting and potentially changing back some of the temporary column names that have been added to generate more data.
- Parameters:
df (polars.LazyFrame)
- Return type:
polars.LazyFrame
- get_possible_scope_values() list[str]¶
Return scope hierarchy columns present in the data (e.g. Issue, Group, Action).
- get_possible_stage_values() list[str]¶
Return the list of available stage values for the current level.
- property stage_to_group_mapping: dict[str, str]¶
Map each Stage name to its Stage Group.
Only meaningful when
level == "Stage"and both columns exist. Returns an empty dict otherwise (including v1 / explainability data).
- property overview_stats: dict[str, object]¶
Creates an overview from the full (filtered) dataset.
Aggregate metrics (Decisions, Customers, Actions, Channels, Duration) are computed over
decision_dataso they reflect the true counts. Only the average-offers-per-stage KPI uses the sample (it requires interaction-level optionality analysis that would be too expensive on the full data).