pdstools.decision_analyzer ========================== .. py:module:: pdstools.decision_analyzer Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/pdstools/decision_analyzer/DecisionAnalyzer/index /autoapi/pdstools/decision_analyzer/data_read_utils/index /autoapi/pdstools/decision_analyzer/plots/index /autoapi/pdstools/decision_analyzer/table_definition/index /autoapi/pdstools/decision_analyzer/utils/index Classes ------- .. autoapisummary:: pdstools.decision_analyzer.DecisionAnalyzer Package Contents ---------------- .. py:class:: DecisionAnalyzer(raw_data: polars.LazyFrame, level='StageGroup', sample_size=DEFAULT_SAMPLE_SIZE, mandatory_expr: Optional[polars.Expr] = None, additional_columns: Optional[Dict[str, polars.DataType]] = None) Analyze NBA decision data from Explainability Extract or Decision Analyzer exports. This class processes raw decision data to create a comprehensive analysis framework for NBA (Next-Best-Action). It supports two data source formats: - **Explainability Extract (v1)**: Simpler format with actions at the arbitration stage. Stages are synthetically derived from ranking. - **Decision Analyzer / EEV2 (v2)**: Full pipeline data with real stage information, filter component names, and detailed strategy tracking. Data can be loaded via class methods or directly: - :meth:`from_explainability_extract`: Load from an Explainability Extract file. - :meth:`from_decision_analyzer`: Load from a Decision Analyzer (EEV2) file. - Direct ``__init__``: Auto-detects format from the data schema. .. attribute:: decision_data Interaction-level decision data (with global filters applied if any). :type: pl.LazyFrame .. attribute:: extract_type Either ``"explainability_extract"`` or ``"decision_analyzer"``. :type: str .. attribute:: plot Plot accessor for visualization methods. :type: Plot .. rubric:: Examples >>> from pdstools import DecisionAnalyzer >>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet") >>> da.get_overview_stats >>> da.plot.sensitivity() .. py:method:: from_explainability_extract(source: Union[str, os.PathLike], **kwargs) -> DecisionAnalyzer :classmethod: Create a DecisionAnalyzer from an Explainability Extract (v1) file. :param source: Path to the Explainability Extract parquet file, or a URL. :type source: Union[str, os.PathLike] :param \*\*kwargs: Additional keyword arguments passed to ``__init__`` (e.g. ``sample_size``, ``mandatory_expr``, ``additional_columns``). :rtype: DecisionAnalyzer .. rubric:: Examples >>> da = DecisionAnalyzer.from_explainability_extract("data/sample_explainability_extract.parquet") .. py:method:: from_decision_analyzer(source: Union[str, os.PathLike], **kwargs) -> DecisionAnalyzer :classmethod: Create a DecisionAnalyzer from a Decision Analyzer / EEV2 (v2) file. :param source: Path to the Decision Analyzer parquet file, or a URL. :type source: Union[str, os.PathLike] :param \*\*kwargs: Additional keyword arguments passed to ``__init__`` (e.g. ``sample_size``, ``mandatory_expr``, ``additional_columns``). :rtype: DecisionAnalyzer .. rubric:: Examples >>> da = DecisionAnalyzer.from_decision_analyzer("data/sample_eev2.parquet") .. py:attribute:: fields_for_data_filtering :value: ['Decision Time', 'Channel', 'Direction', 'Issue', 'Group', 'Action', 'Treatment', 'Stage',... .. py:attribute:: plot .. py:attribute:: level :value: 'StageGroup' .. py:attribute:: sample_size :value: 50000 .. py:attribute:: extract_type :value: 'decision_analyzer' .. py:attribute:: validation_error :value: 'The following default columns are missing: ' .. py:attribute:: unfiltered_raw_decision_data .. py:attribute:: preaggregation_columns .. py:attribute:: max_win_rank :value: 5 .. py:attribute:: AvailableNBADStages :value: ['Arbitration', 'Output'] .. py:property:: stages_from_arbitration_down All stages in the filter view starting at Arbitration. This initially will just be [Arbitration, Final] but as we get more stages in there may be more here. .. py:property:: arbitration_stage .. py:property:: num_sample_interactions Number of unique interactions in the sample. Automatically triggers sampling if not yet calculated. .. py:method:: _invalidate_cached_properties() Resets the properties of the class. Needed for global filters. .. py:method:: applyGlobalDataFilters(filters: Optional[Union[polars.Expr, List[polars.Expr]]] = None) Apply a global set of filters .. py:method:: resetGlobalDataFilters() .. py:property:: getPreaggregatedFilterView Pre-aggregates the full dataset over customers and interactions providing a view of what is filtered at a stage. This pre-aggregation is pretty similar to what "VBD" does to interaction history. It aggregates over individual customers and interactions giving summary statistics that are sufficient to drive most of the analyses (but not all). The results of this pre-aggregation are much smaller than the original data and is expected to easily fit in memory. We therefore use polars caching to efficiently cache this. This "filter" view keeps the same organization as the decision analyzer data in that it records the actions that get filtered out at stages. From this a "remaining" view is easily derived. .. py:property:: getPreaggregatedRemainingView Pre-aggregates the full dataset over customers and interactions providing a view of remaining offers. This pre-aggregation builds on the filter view and aggregates over the stages remaining. .. py:property:: sample Hash-based deterministic sample of interactions for resource-intensive analyses. Selects up to ``sample_size`` unique interactions using a hash of ``pxInteractionID``. All actions within a selected interaction are kept. If fewer interactions exist than ``sample_size``, no sampling is performed. .. py:method:: getAvailableFieldsForFiltering(categoricalOnly=False) .. py:method:: cleanup_raw_data(df: polars.LazyFrame) This method cleans up the raw data we read from parquet/S3/whatever. This likely needs to change as and when we get closer to product, to match what comes out of Pega. It does some modest type casting and potentially changing back some of the temporary column names that have been added to generate more data. .. py:method:: getPossibleScopeValues() .. py:method:: getPossibleStageValues() .. py:method:: getDistributionData(stage: str, grouping_levels: Union[str, List[str]], additional_filters: Optional[Union[polars.Expr, List[polars.Expr]]] = None) -> polars.LazyFrame .. py:method:: getFunnelData(scope, additional_filters: Optional[Union[polars.Expr, List[polars.Expr]]] = None) -> polars.LazyFrame .. py:method:: getFilterComponentData(top_n, additional_filters: Optional[Union[polars.Expr, List[polars.Expr]]] = None) -> polars.DataFrame .. py:method:: getComponentActionImpact(top_n: int = 10, scope: str = 'Action', additional_filters: Optional[Union[polars.Expr, List[polars.Expr]]] = None) -> polars.DataFrame Per-component breakdown of which items are filtered and how many. For each component, returns the top-N items (at the chosen scope granularity) it filters out. The scope controls whether the breakdown is at Issue, Group, or Action level. :param top_n: Maximum number of items to return per component. :type top_n: int, default 10 :param scope: Granularity level: ``"Issue"``, ``"Group"``, or ``"Action"``. :type scope: str, default "Action" :param additional_filters: Extra filters to apply before aggregation. :type additional_filters: pl.Expr or list of pl.Expr, optional :returns: Columns include pxComponentName, StageGroup, scope columns, and Filtered Decisions. Sorted by component then descending count. :rtype: pl.DataFrame .. py:method:: getComponentDrilldown(component_name: str, additional_filters: Optional[Union[polars.Expr, List[polars.Expr]]] = None) -> polars.DataFrame Deep-dive into a single filter component showing dropped actions and their potential value. Since scoring columns (Priority, Value, Propensity) are typically null on FILTERED_OUT rows, this method derives the action's "potential value" by looking up average scores from rows where the same action survives (non-null Priority/Value). This gives the "value of what's being dropped" perspective. :param component_name: The pxComponentName to drill into. :type component_name: str :param additional_filters: Extra filters to apply before aggregation. :type additional_filters: pl.Expr or list of pl.Expr, optional :returns: Columns: pyIssue, pyGroup, pyName, Filtered Decisions, avg_Priority, avg_Value, avg_Propensity, pxComponentType (if available). Sorted by Filtered Decisions descending. :rtype: pl.DataFrame .. py:method:: reRank(additional_filters: Optional[Union[polars.Expr, List[polars.Expr]]] = None, overrides: List[polars.Expr] = []) -> polars.LazyFrame Calculates prio and rank for all PVCL combinations .. py:method:: get_win_loss_distribution_data(level, win_rank) .. py:method:: get_optionality_data(df) Finding the average number of actions per stage without trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately. .. py:method:: get_optionality_data_with_trend(df=None) Finding the average number of actions per stage with trend analysis. We have to go back to the interaction level data, no way to use pre-aggregations unfortunately. .. py:method:: get_optionality_funnel(df=None) .. py:method:: getActionVariationData(stage) .. py:method:: getABTestResults() .. py:method:: getThresholdingData(fld, quantile_range=range(10, 100, 10)) .. py:method:: priority_component_distribution(component, granularity) .. py:method:: aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: List[str], aggregations: List = []) -> polars.LazyFrame Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented. .. py:method:: filtered_action_counts(groupby_cols: list, propensityTH: Optional[float] = None, priorityTH: Optional[float] = None) -> polars.LazyFrame Return action counts from the sample, optionally classified by propensity/priority thresholds. :param groupby_cols: Column names to group by. :type groupby_cols: list :param propensityTH: Propensity threshold for classifying offers. :type propensityTH: float, optional :param priorityTH: Priority threshold for classifying offers. :type priorityTH: float, optional :returns: Aggregated action counts per group, with quality buckets when both thresholds are provided. :rtype: pl.LazyFrame .. py:method:: get_offer_quality(df, group_by) Given a dataframe with filtered action counts at stages. Flips it to usual VF view by doing a rolling sum over stages. :param df: Decision Analyzer style filtered action counts dataframe. :type df: pl.LazyFrame :param groupby_cols: The list of column names to group by([self.level, "Interaction ID"]). :type groupby_cols: list :returns: Value Finder style, available action counts per group_by category :rtype: pl.LazyFrame .. py:property:: get_overview_stats Creates an overview from sampled data .. py:method:: get_sensitivity(win_rank=1, filters=None) Global Sensitivity: Number of decisions where original rank-1 changes. Local Sensitivity: Number of times the selected offer(s) are in the rank-1 when dropping one of the prioritization factors. :param win_rank: Maximum rank to be considered a winner. :type win_rank: Int :param filters: Selected offers, only used in local sensitivity analysis. :type filters: List[pl.Expr] :rtype: pl.LazyFrame .. py:method:: get_offer_variability_stats(stage) .. py:method:: get_winning_or_losing_interactions(win_rank, group_filter, win: bool) .. py:method:: winning_from(interactions, win_rank, groupby_cols, top_k) .. py:method:: losing_to(interactions, win_rank, groupby_cols, top_k) .. py:method:: get_win_distribution_data(lever_condition: polars.Expr, lever_value: Optional[float] = None, all_interactions: Optional[int] = None) -> polars.DataFrame Calculate win distribution data for business lever analysis. This method generates distribution data showing how actions perform in arbitration decisions, both in baseline conditions and optionally with lever adjustments applied. :param lever_condition: Polars expression defining which actions to apply the lever to. Example: pl.col("Action") == "SpecificAction" or (pl.col("Issue") == "Service") & (pl.col("Group") == "Cards") :type lever_condition: pl.Expr :param lever_value: The lever multiplier value to apply to selected actions. If None, returns baseline distribution only. If provided, returns both original and lever-adjusted win counts. :type lever_value: float, optional :param all_interactions: Total number of interactions to calculate "no winner" count. If provided, enables calculation of interactions without any winner. If None, "no winner" data is not calculated. :type all_interactions: int, optional :returns: DataFrame containing win distribution with columns: - pyIssue, pyGroup, pyName: Action identifiers - original_win_count: Number of rank-1 wins in baseline scenario - new_win_count: Number of rank-1 wins after lever adjustment (only if lever_value provided) - n_decisions_survived_to_arbitration: Number of arbitration decisions the action participated in - selected_action: "Selected" for actions matching lever_condition, "Rest" for others - no_winner_count: Number of interactions without any winner (only if all_interactions provided) :rtype: pl.DataFrame .. rubric:: Notes - Only includes actions that survive to arbitration stage - Win counts represent rank-1 (first place) finishes in arbitration decisions - This is a zero-sum analysis: boosting selected actions suppresses others - Results are sorted by win count (new_win_count if available, else original_win_count) - When all_interactions is provided, "no winner" represents interactions without any rank-1 winner .. rubric:: Examples Get baseline distribution for a specific action: >>> lever_cond = pl.col("Action") == "MyAction" >>> baseline = decision_analyzer.get_win_distribution_data(lever_cond) Get distribution with 2x lever applied to service actions: >>> lever_cond = pl.col("Issue") == "Service" >>> with_lever = decision_analyzer.get_win_distribution_data(lever_cond, 2.0) Get distribution with no winner count: >>> total_interactions = 10000 >>> with_no_winner = decision_analyzer.get_win_distribution_data(lever_cond, 2.0, total_interactions) .. py:method:: get_trend_data(stage: str = 'AvailableActions', scope: Union[Literal['Group', 'Issue', 'Action'], None] = 'Group') -> polars.DataFrame .. py:method:: find_lever_value(lever_condition: polars.Expr, target_win_percentage: float, win_rank: int = 1, low: float = 0, high: float = 100, precision: float = 0.01, ranking_stages: List[str] = None) -> float Binary search algorithm to find lever value needed to achieve a desired win percentage. :param lever_condition: Polars expression that defines which actions should receive the lever :type lever_condition: pl.Expr :param target_win_percentage: The desired win percentage (0-100) :type target_win_percentage: float :param win_rank: Consider actions winning if they rank <= this value :type win_rank: int, default 1 :param low: Lower bound for lever search range :type low: float, default 0 :param high: Upper bound for lever search range :type high: float, default 100 :param precision: Search precision - smaller values give more accurate results :type precision: float, default 0.01 :param ranking_stages: List of stages to include in analysis. Defaults to ["Arbitration"] :type ranking_stages: List[str], optional :returns: The lever value needed to achieve the target win percentage :rtype: float :raises ValueError: If the target win percentage cannot be achieved within the search range