pdstools.decision_analyzer.decision_data¶

Classes¶

DecisionAnalyzer

Container data class for the raw decision data. Only one instance of this

Module Contents¶

class DecisionAnalyzer(raw_data: polars.LazyFrame, level='StageGroup', sample_size=50000, mandatory_expr: polars.Expr | None = None, additional_columns: Dict[str, polars.DataType] | None = None)¶

Container data class for the raw decision data. Only one instance of this should exist and will be associated with the streamlit app state.

It will keep a pointer to the raw interaction level data (as a lazy frame) but also has VBD-style aggregation(s) to speed things up.

Parameters:

raw_data (polars.LazyFrame)
mandatory_expr (Optional[polars.Expr])
additional_columns (Optional[Dict[str, polars.DataType]])

unfiltered_raw_decision_data: polars.LazyFrame = None¶

decision_data: polars.LazyFrame = None¶

preaggregated_decision_data_filterview: polars.LazyFrame = None¶

preaggregated_decision_data_remainingview: polars.LazyFrame = None¶

fields_for_data_filtering = ['pxDecisionTime', 'pyConfigurationName', 'pyChannel', 'pyDirection', 'pyIssue', 'pyGroup',...¶

plot¶

level = 'StageGroup'¶

sample_size = 50000¶

extract_type = 'decision_analyzer'¶

validation_error = 'The following default columns are missing: '¶

preaggregation_columns¶

max_win_rank = 5¶

AvailableNBADStages = ['Arbitration', 'Output']¶

property stages_from_arbitration_down¶: All stages in the filter view starting at Arbitration. This initially will just be [Arbitration, Final] but as we get more stages in there may be more here.

property arbitration_stage¶

property num_sample_interactions¶: Number of unique interactions in the sample. Automatically triggers sampling if not yet calculated.

_invalidate_cached_properties()¶: Resets the properties of the class. Needed for global filters.

applyGlobalDataFilters(filters: polars.Expr | List[polars.Expr] | None = None)¶

Apply a global set of filters

Parameters:: filters (Optional[Union[polars.Expr, List[polars.Expr]]])

resetGlobalDataFilters()¶

property getPreaggregatedFilterView¶

Pre-aggregates the full dataset over customers and interactions providing a view of what is filtered at a stage.

This pre-aggregation is pretty similar to what “VBD” does to interaction history. It aggregates over individual customers and interactions giving summary statistics that are sufficient to drive most of the analyses (but not all). The results of this pre-aggregation are much smaller than the original data and is expected to easily fit in memory. We therefore use polars caching to efficiently cache this.

This “filter” view keeps the same organization as the decision analyzer data in that it records the actions that get filtered out at stages. From this a “remaining” view is easily derived.

property getPreaggregatedRemainingView¶

Pre-aggregates the full dataset over customers and interactions providing a view of remaining offers.

This pre-aggregation builds on the filter view and aggregates over the stages remaining.

property sample¶: Create a sample of the data by taking the first 50,000 interactions. If there are fewer than 50,000 total interactions, no sampling is performed.

getAvailableFieldsForFiltering(categoricalOnly=False)¶

cleanup_raw_data(df: polars.LazyFrame)¶

This method cleans up the raw data we read from parquet/S3/whatever.

This likely needs to change as and when we get closer to product, to match what comes out of Pega. It does some modest type casting and potentially changing back some of the temporary column names that have been added to generate more data.

Parameters:: df (polars.LazyFrame)

getPossibleScopeValues()¶

getPossibleStageValues()¶

getDistributionData(stage: str, grouping_levels: str | List[str], additional_filters: polars.Expr | List[polars.Expr] | None = None) → polars.LazyFrame¶

Parameters:

stage (str)
grouping_levels (Union[str, List[str]])
additional_filters (Optional[Union[polars.Expr, List[polars.Expr]]])

Return type:

polars.LazyFrame

getFunnelData(scope, additional_filters: polars.Expr | List[polars.Expr] | None = None) → polars.LazyFrame¶

Parameters:: additional_filters (Optional[Union[polars.Expr, List[polars.Expr]]])
Return type:: polars.LazyFrame

getFilterComponentData(top_n, additional_filters: polars.Expr | List[polars.Expr] | None = None) → polars.DataFrame¶

Parameters:: additional_filters (Optional[Union[polars.Expr, List[polars.Expr]]])
Return type:: polars.DataFrame

reRank(additional_filters: polars.Expr | List[polars.Expr] | None = None, overrides: List[polars.Expr] = []) → polars.LazyFrame¶

Calculates prio and rank for all PVCL combinations

Parameters:

additional_filters (Optional[Union[polars.Expr, List[polars.Expr]]])
overrides (List[polars.Expr])

Return type:

polars.LazyFrame

get_win_loss_distribution_data(level, win_rank)¶

get_optionality_data(df)¶: Finding the average number of actions per stage without trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately.

get_optionality_data_with_trend(df=None)¶: Finding the average number of actions per stage with trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately.

get_optionality_funnel(df=None)¶

getActionVariationData(stage)¶

getABTestResults()¶

getThresholdingData(fld, quantile_range=range(10, 100, 10))¶

priority_component_distribution(component, granularity)¶

aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: List[str], aggregations: List = []) → polars.LazyFrame¶

Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented.

Parameters:

df (polars.LazyFrame)
group_by_columns (List[str])
aggregations (List)

Return type:

polars.LazyFrame

get_offer_quality(df, group_by)¶

Given a dataframe with filtered action counts at stages. Flips it to usual VF view by doing a rolling sum over stages.

Parameters:

df (pl.LazyFrame) – Decision Analyzer style filtered action counts dataframe.
groupby_cols (list) – The list of column names to group by([self.level, “pxInteractionID”]).

Returns:

Value Finder style, available action counts per group_by category

Return type:

pl.LazyFrame

property get_overview_stats¶: Creates an overview from sampled data

get_sensitivity(win_rank=1, filters=None)¶

Global Sensitivity: Number of decisions where original rank-1 changes. Local Sensitivity: Number of times the selected offer(s) are in the rank-1 when dropping one of the prioritization factors.

Parameters:

win_rank (Int) – Maximum rank to be considered a winner.
filters (List[pl.Expr]) – Selected offers, only used in local sensitivity analysis.

Return type:

pl.LazyFrame

get_offer_variability_stats(stage)¶

get_winning_or_losing_interactions(win_rank, group_filter, win: bool)¶

Parameters:: win (bool)

winning_from(interactions, win_rank, groupby_cols, top_k)¶

losing_to(interactions, win_rank, groupby_cols, top_k)¶

get_win_distribution_data(lever_condition: polars.Expr, lever_value: float | None = None, all_interactions: int | None = None) → polars.DataFrame¶

Calculate win distribution data for business lever analysis.

This method generates distribution data showing how actions perform in arbitration decisions, both in baseline conditions and optionally with lever adjustments applied.

Parameters:

lever_condition (pl.Expr) –
Polars expression defining which actions to apply the lever to. Example: pl.col(“pyName”) == “SpecificAction” or

(pl.col(“pyIssue”) == “Service”) & (pl.col(“pyGroup”) == “Cards”)
lever_value (float, optional) – The lever multiplier value to apply to selected actions. If None, returns baseline distribution only. If provided, returns both original and lever-adjusted win counts.
all_interactions (int, optional) – Total number of interactions to calculate “no winner” count. If provided, enables calculation of interactions without any winner. If None, “no winner” data is not calculated.

Returns:

DataFrame containing win distribution with columns: - pyIssue, pyGroup, pyName: Action identifiers - original_win_count: Number of rank-1 wins in baseline scenario - new_win_count: Number of rank-1 wins after lever adjustment (only if lever_value provided) - n_decisions_survived_to_arbitration: Number of arbitration decisions the action participated in - selected_action: “Selected” for actions matching lever_condition, “Rest” for others - no_winner_count: Number of interactions without any winner (only if all_interactions provided)

Return type:

pl.DataFrame

Notes

Only includes actions that survive to arbitration stage
Win counts represent rank-1 (first place) finishes in arbitration decisions
This is a zero-sum analysis: boosting selected actions suppresses others
Results are sorted by win count (new_win_count if available, else original_win_count)
When all_interactions is provided, “no winner” represents interactions without any rank-1 winner

Examples

Get baseline distribution for a specific action: >>> lever_cond = pl.col(“pyName”) == “MyAction” >>> baseline = decision_analyzer.get_win_distribution_data(lever_cond)

Get distribution with 2x lever applied to service actions: >>> lever_cond = pl.col(“pyIssue”) == “Service” >>> with_lever = decision_analyzer.get_win_distribution_data(lever_cond, 2.0)

Get distribution with no winner count: >>> total_interactions = 10000 >>> with_no_winner = decision_analyzer.get_win_distribution_data(lever_cond, 2.0, total_interactions)

get_trend_data(stage: str = 'AvailableActions', scope: Literal['pyGroup', 'pyIssue', 'pyName'] | None = 'pyGroup') → polars.DataFrame¶

Parameters:

stage (str)
scope (Union[Literal['pyGroup', 'pyIssue', 'pyName'], None])

Return type:

polars.DataFrame

find_lever_value(lever_condition: polars.Expr, target_win_percentage: float, win_rank: int = 1, low: float = 0, high: float = 100, precision: float = 0.01, ranking_stages: List[str] = None) → float¶

Binary search algorithm to find lever value needed to achieve a desired win percentage.

Parameters:

lever_condition (pl.Expr) – Polars expression that defines which actions should receive the lever
target_win_percentage (float) – The desired win percentage (0-100)
win_rank (int, default 1) – Consider actions winning if they rank <= this value
low (float, default 0) – Lower bound for lever search range
high (float, default 100) – Upper bound for lever search range
precision (float, default 0.01) – Search precision - smaller values give more accurate results
ranking_stages (List[str], optional) – List of stages to include in analysis. Defaults to [“Arbitration”]

Returns:

The lever value needed to achieve the target win percentage

Return type:

float

Raises:

ValueError – If the target win percentage cannot be achieved within the search range