pdstools.decision_analyzer¶
Submodules¶
Classes¶
Container data class for the raw decision data. Only one instance of this |
Package Contents¶
- class DecisionAnalyzer(raw_data: polars.LazyFrame)¶
Container data class for the raw decision data. Only one instance of this should exist and will be associated with the streamlit app state.
It will keep a pointer to the raw interaction level data (as a lazy frame) but also has VBD-style aggregation(s) to speed things up.
- Parameters:
raw_data (polars.LazyFrame)
- unfiltered_raw_decision_data: polars.LazyFrame = None¶
- decision_data: polars.LazyFrame = None¶
- preaggregated_decision_data_filterview: polars.LazyFrame = None¶
- preaggregated_decision_data_remainingview: polars.LazyFrame = None¶
- fields_for_data_filtering = ['pxDecisionTime', 'pyConfigurationName', 'pyChannel', 'pyDirection', 'pyIssue', 'pyGroup',...¶
- plot¶
- extract_type = 'decision_analyzer'¶
- preaggregation_columns¶
- max_win_rank = 5¶
- AvailableNBADStages = ['Arbitration']¶
- property stages_from_arbitration_down¶
All stages in the filter view starting at Arbitration. This initially will just be [Arbitration, Final] but as we get more stages in there may be more here.
- property arbitration_stage¶
- _invalidate_cached_properties()¶
Resets the properties of the class
- applyGlobalDataFilters(filters: polars.Expr | List[polars.Expr] | None = None)¶
Apply a global set of filters
- Parameters:
filters (Optional[Union[polars.Expr, List[polars.Expr]]])
- resetGlobalDataFilters()¶
- property getPreaggregatedFilterView¶
Pre-aggregates the full dataset over customers and interactions providing a view of what is filtered at a stage.
This pre-aggregation is pretty similar to what “VBD” does to interaction history. It aggregates over individual customers and interactions giving summary statistics that are sufficient to drive most of the analyses (but not all). The results of this pre-aggregation are much smaller than the original data and is expected to easily fit in memory. We therefore use polars caching to efficiently cache this.
This “filter” view keeps the same organization as the decision analyzer data in that it records the actions that get filtered out at stages. From this a “remaining” view is easily derived.
- property getPreaggregatedRemainingView¶
Pre-aggregates the full dataset over customers and interactions providing a view of remaining offers.
This pre-aggregation builds on the filter view and aggregates over the stages remaining.
- property sample¶
- getAvailableFieldsForFiltering(categoricalOnly=False)¶
- cleanup_raw_data(df: polars.LazyFrame)¶
This method cleans up the raw data we read from parquet/S3/whatever.
This likely needs to change as and when we get closer to product, to match what comes out of Pega. It does some modest type casting and potentially changing back some of the temporary column names that have been added to generate more data.
- Parameters:
df (polars.LazyFrame)
- getPossibleScopeValues()¶
- getPossibleStageValues()¶
- getDistributionData(stage: str, grouping_levels: List[str], trend=False, additional_filters: polars.Expr | List[polars.Expr] | None = None) polars.LazyFrame ¶
- getFunnelData(level, additional_filters: polars.Expr | List[polars.Expr] | None = None) polars.LazyFrame ¶
- Parameters:
additional_filters (Optional[Union[polars.Expr, List[polars.Expr]]])
- Return type:
polars.LazyFrame
- getFilterComponentData(top_n, additional_filters: polars.Expr | List[polars.Expr] | None = None) polars.DataFrame ¶
- Parameters:
additional_filters (Optional[Union[polars.Expr, List[polars.Expr]]])
- Return type:
polars.DataFrame
- reRank(additional_filters: polars.Expr | List[polars.Expr] | None = None, overrides: List[polars.Expr] = []) polars.LazyFrame ¶
Calculates prio and rank for all PVCL combinations
- Parameters:
additional_filters (Optional[Union[polars.Expr, List[polars.Expr]]])
overrides (List[polars.Expr])
- Return type:
polars.LazyFrame
- get_win_loss_distribution_data(level, win_rank)¶
- property get_optionality_data¶
Finding the average number of actions per stage without trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately.
- property get_optionality_data_with_trend¶
Finding the average number of actions per stage with trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately.
- getActionVariationData(stage)¶
- getABTestResults()¶
- getThresholdingData(fld, quantile_range=range(10, 100, 10))¶
- getValueDistributionData()¶
- aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: List[str], aggregations: List = []) polars.LazyFrame ¶
Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage. Used all over the place.
- Parameters:
df (polars.LazyFrame)
group_by_columns (List[str])
aggregations (List)
- Return type:
polars.LazyFrame
- get_offer_quality(df, group_by)¶
Given a dataframe with filtered action counts at stages. Flips it to usual VF view by doing a rolling sum over stages.
- Parameters:
df (pl.LazyFrame) – Decision Analyzer style filtered action counts dataframe.
groupby_cols (list) – The list of column names to group by([“pxEngagementStage”, “pxInteractionID”]).
- Returns:
Value Finder style, available action counts per group_by category
- Return type:
pl.LazyFrame
- property get_overview_stats¶
Creates an overview from sampled data
- get_sensitivity(win_rank=1, filters=None)¶
- get_offer_variability_stats(stage)¶
- winning_from(interactions, win_rank, groupby_cols, top_k)¶
- losing_to(interactions, win_rank, groupby_cols, top_k)¶