pdstools.decision_analyzer._aggregates

Aggregation namespace for DecisionAnalyzer.

Methods in this module are exposed via da.aggregates.<method>. They encapsulate the data-shaping queries that turn the pre-aggregated views into the frames consumed by the plot layer and the Streamlit pages.

Classes

Aggregates

Aggregation queries over the pre-aggregated views.

Module Contents

class Aggregates(da: pdstools.decision_analyzer.DecisionAnalyzer.DecisionAnalyzer)

Aggregation queries over the pre-aggregated views.

Accessed via DecisionAnalyzer.aggregates.

Parameters:

da (pdstools.decision_analyzer.DecisionAnalyzer.DecisionAnalyzer)

da
aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: list[str], aggregations: list[polars.Expr] | None = None) polars.LazyFrame

Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented.

Parameters:
  • df (polars.LazyFrame)

  • group_by_columns (list[str])

  • aggregations (list[polars.Expr] | None)

Return type:

polars.LazyFrame

get_distribution_data(stage: str, grouping_levels: str | list[str], additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Distribution of decisions by grouping columns at a given stage.

Parameters:
  • stage (str) – Stage to filter on.

  • grouping_levels (str or list of str) – Column(s) to group by (e.g. "Action" or ["Channel", "Action"]).

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied before aggregation.

Returns:

Columns from grouping_levels plus Decisions, sorted descending.

Return type:

pl.LazyFrame

get_funnel_data(scope: str, additional_filters: polars.Expr | list[polars.Expr] | None = None) tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame]
Parameters:
  • scope (str)

  • additional_filters (polars.Expr | list[polars.Expr] | None)

Return type:

tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame]

get_decisions_without_actions_data(additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Per-stage count of interactions newly left with no remaining actions.

Returns a DataFrame with columns [self.da.level, “decisions_without_actions”], sorted in pipeline order. For each stage X, the value is the number of interactions that lose their final remaining action at stage X.

Parameters:

additional_filters (polars.Expr | list[polars.Expr] | None)

Return type:

polars.DataFrame

get_funnel_summary(available_df: polars.LazyFrame, passing_df: polars.DataFrame, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Per-stage summary: Available, Passing, Filtered actions and Decisions.

The table matches the funnel chart: it starts with a synthetic “Available Actions” row and excludes the Output stage.

Parameters:
  • available_df (pl.LazyFrame) – First element returned by get_funnel_data (actions entering each stage).

  • passing_df (pl.DataFrame) – Second element returned by get_funnel_data (actions exiting each stage).

  • additional_filters (optional) – Same filters used when calling get_funnel_data.

Returns:

One row per stage in pipeline order with raw counts first, then per-decision averages.

Return type:

pl.DataFrame

get_optionality_data(df: polars.LazyFrame | None = None, by_day: bool = False) polars.LazyFrame

Average number of actions per stage, optionally broken down by day.

Computes per-interaction action counts at each stage using aggregate_remaining_per_stage, then aggregates into a histogram.

Parameters:
  • df (pl.LazyFrame, optional) – Input data. Defaults to sample.

  • by_day (bool, default False) – If True, include "day" in the grouping for trend analysis. When False, zero-action rows are injected for stages where some interactions have no remaining actions.

Return type:

pl.LazyFrame

get_optionality_funnel(df: polars.LazyFrame | None = None) polars.LazyFrame

Optionality funnel: interaction counts bucketed by available-action count.

Buckets action counts into 0–6 and 7+, then counts interactions per stage and bucket. Used by the optionality funnel chart.

Parameters:

df (pl.LazyFrame, optional) – Input data. Defaults to sample.

Return type:

pl.LazyFrame

get_action_variation_data(stage: str, color_by: str | None = None) polars.LazyFrame

Get action variation data, optionally broken down by a categorical dimension.

Args:

stage: The stage to analyze color_by: Optional categorical column to break down the variation by.

Can use “Channel/Direction” to combine Channel and Direction columns.

Parameters:
  • stage (str)

  • color_by (str | None)

Return type:

polars.LazyFrame

get_offer_variability_stats(stage: str) dict[str, float]

Summary statistics for action variation at a stage.

Parameters:

stage (str) – Stage to analyse.

Returns:

n90 — number of actions covering 90 % of decisions. gini — Gini coefficient of decision concentration.

Return type:

dict

get_offer_quality(df: polars.LazyFrame, group_by: str | list[str]) polars.LazyFrame

Cumulative offer-quality breakdown across stages.

Takes a filtered-action-counts frame (from filtered_action_counts()) and converts it to a remaining-per-stage view, joining in customers that have zero actions so they are counted as well.

Parameters:
  • df (pl.LazyFrame) – Filtered action counts with columns no_of_offers, new_models, poor_propensity_offers, etc.

  • group_by (str or list of str) – Columns to group by (e.g. ["Interaction ID"]).

Returns:

Per-stage quality classification with boolean flag columns (has_no_offers, atleast_one_relevant_action, etc.).

Return type:

pl.LazyFrame

filtered_action_counts(groupby_cols: list[str], propensity_th: float | None = None, priority_th: float | None = None, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame

Return action counts from the sample, optionally classified by propensity/priority thresholds.

Parameters:
  • groupby_cols (list of str) – Column names to group by.

  • propensity_th (float, optional) – Propensity threshold for classifying offers.

  • priority_th (float, optional) – Priority threshold for classifying offers.

  • additional_filters (pl.Expr or list[pl.Expr], optional) – Extra filters applied to the sample (e.g. channel filter).

Returns:

Aggregated action counts per group, with quality buckets when both thresholds are provided.

Return type:

pl.LazyFrame

get_trend_data(stage: str = 'AvailableActions', scope: Literal['Group', 'Issue', 'Action'] | None = 'Group', additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Daily trend of unique decisions from a given stage onward.

Parameters:
  • stage (str, default "AvailableActions") – Starting stage; all stages from this point onward are included.

  • scope ({"Group", "Issue", "Action"} or None, default "Group") – Optional grouping dimension. If None, returns totals by day.

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied to the sample.

Returns:

Columns: day, optionally scope, and Decisions.

Return type:

pl.DataFrame

get_filter_component_data(top_n: int, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Top-N filter components per stage, ranked by filtered-decision count.

Parameters:
  • top_n (int) – Maximum number of components to return per stage.

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied before aggregation.

Returns:

Columns include the stage level, Component Name, and Filtered Decisions.

Return type:

pl.DataFrame

get_component_action_impact(top_n: int = 10, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame

Per-component breakdown of which items are filtered and how many.

For each component, returns the top-N items (at the chosen scope granularity) it filters out. The scope controls whether the breakdown is at Issue, Group, or Action level.

Parameters:
  • top_n (int, default 10) – Maximum number of items to return per component.

  • scope (str, default "Action") – Granularity level: "Issue", "Group", or "Action".

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters to apply before aggregation.

Returns:

Columns include Component Name, StageGroup, scope columns, and Filtered Decisions. Sorted by component then descending count.

Return type:

pl.DataFrame

get_component_drilldown(component_name: str, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None, sort_by: str = 'Filtered Decisions') polars.DataFrame

Deep-dive into a single filter component showing dropped actions and their potential value.

Since scoring columns (Priority, Value, Propensity) are typically null on FILTERED_OUT rows, this method derives the action’s “potential value” by looking up average scores from rows where the same action survives (non-null Priority/Value). This gives the “value of what’s being dropped” perspective.

Parameters:
  • component_name (str) – The Component Name to drill into.

  • scope (str, default "Action") – Granularity level: "Issue", "Group", or "Action".

  • additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters to apply before aggregation.

  • sort_by (str, default "Filtered Decisions") – Column to sort results by (descending).

Returns:

Columns include scope columns, Filtered Decisions, avg_Priority, avg_Value, avg_Propensity, Component Type (if available).

Return type:

pl.DataFrame

get_ab_test_results() polars.DataFrame

A/B test summary: control vs test counts and control percentage per stage.

Returns:

One row per stage with columns for Control, Test counts and Control Percentage. Rows preserve the canonical AvailableNBADStages order (stages absent from the data are omitted).

Return type:

pl.DataFrame