pdstools.decision_analyzer._aggregates¶

Aggregation namespace for DecisionAnalyzer.

Methods in this module are exposed via da.aggregates.<method>. They encapsulate the data-shaping queries that turn the pre-aggregated views into the frames consumed by the plot layer and the Streamlit pages.

Classes¶

Aggregates

Aggregation queries over the pre-aggregated views.

Module Contents¶

class Aggregates(da: pdstools.decision_analyzer.DecisionAnalyzer.DecisionAnalyzer)¶

Aggregation queries over the pre-aggregated views.

Accessed via DecisionAnalyzer.aggregates.

Parameters:: da (pdstools.decision_analyzer.DecisionAnalyzer.DecisionAnalyzer)

da¶

aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: list[str], aggregations: list[polars.Expr] | None = None) → polars.LazyFrame¶

Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented.

Parameters:

df (polars.LazyFrame)
group_by_columns (list[str])
aggregations (list[polars.Expr] | None)

Return type:

polars.LazyFrame

get_distribution_data(stage: str, grouping_levels: str | list[str], additional_filters: polars.Expr | list[polars.Expr] | None = None) → polars.LazyFrame¶

Distribution of decisions by grouping columns at a given stage.

Parameters:

stage (str) – Stage to filter on.
grouping_levels (str or list of str) – Column(s) to group by (e.g. "Action" or ["Channel", "Action"]).
additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied before aggregation.

Returns:

Columns from grouping_levels plus Decisions, sorted descending.

Return type:

pl.LazyFrame

get_funnel_data(scope: str, additional_filters: polars.Expr | list[polars.Expr] | None = None) → tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame]¶

Parameters:

scope (str)
additional_filters (polars.Expr | list[polars.Expr] | None)

Return type:

tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame]

get_decisions_without_actions_data(additional_filters: polars.Expr | list[polars.Expr] | None = None) → polars.DataFrame¶

Per-stage count of interactions newly left with no remaining actions.

Returns a DataFrame with columns [self.da.level, “decisions_without_actions”], sorted in pipeline order. For each stage X, the value is the number of interactions that lose their final remaining action at stage X.

Parameters:: additional_filters (polars.Expr | list[polars.Expr] | None)
Return type:: polars.DataFrame

get_funnel_summary(available_df: polars.LazyFrame, passing_df: polars.DataFrame, additional_filters: polars.Expr | list[polars.Expr] | None = None) → polars.DataFrame¶

Per-stage summary: Available, Passing, Filtered actions and Decisions.

The table matches the funnel chart: it starts with a synthetic “Available Actions” row and excludes the Output stage.

Parameters:

available_df (pl.LazyFrame) – First element returned by get_funnel_data (actions entering each stage).
passing_df (pl.DataFrame) – Second element returned by get_funnel_data (actions exiting each stage).
additional_filters (optional) – Same filters used when calling get_funnel_data.

Returns:

One row per stage in pipeline order with raw counts first, then per-decision averages.

Return type:

pl.DataFrame

get_optionality_data(df: polars.LazyFrame | None = None, by_day: bool = False) → polars.LazyFrame¶

Average number of actions per stage, optionally broken down by day.

Computes per-interaction action counts at each stage using aggregate_remaining_per_stage, then aggregates into a histogram.

Parameters:

df (pl.LazyFrame, optional) – Input data. Defaults to sample.
by_day (bool, default False) – If True, include "day" in the grouping for trend analysis. When False, zero-action rows are injected for stages where some interactions have no remaining actions.

Return type:

pl.LazyFrame

get_optionality_funnel(df: polars.LazyFrame | None = None) → polars.LazyFrame¶

Optionality funnel: interaction counts bucketed by available-action count.

Buckets action counts into 0–6 and 7+, then counts interactions per stage and bucket. Used by the optionality funnel chart.

Parameters:: df (pl.LazyFrame, optional) – Input data. Defaults to sample.
Return type:: pl.LazyFrame

get_action_variation_data(stage: str, color_by: str | None = None) → polars.LazyFrame¶

Get action variation data, optionally broken down by a categorical dimension.

Args:: stage: The stage to analyze color_by: Optional categorical column to break down the variation by.

Can use “Channel/Direction” to combine Channel and Direction columns.

Parameters:

stage (str)
color_by (str | None)

Return type:

polars.LazyFrame

get_offer_variability_stats(stage: str) → dict[str, float]¶

Summary statistics for action variation at a stage.

Parameters:: stage (str) – Stage to analyse.
Returns:: n90 — number of actions covering 90 % of decisions. gini — Gini coefficient of decision concentration.
Return type:: dict

get_offer_quality(df: polars.LazyFrame, group_by: str | list[str]) → polars.LazyFrame¶

Cumulative offer-quality breakdown across stages.

Takes a filtered-action-counts frame (from filtered_action_counts()) and converts it to a remaining-per-stage view, joining in customers that have zero actions so they are counted as well.

Parameters:

df (pl.LazyFrame) – Filtered action counts with columns no_of_offers, new_models, poor_propensity_offers, etc.
group_by (str or list of str) – Columns to group by (e.g. ["Interaction ID"]).

Returns:

Per-stage quality classification with boolean flag columns (has_no_offers, atleast_one_relevant_action, etc.).

Return type:

pl.LazyFrame

filtered_action_counts(groupby_cols: list[str], propensity_th: float | None = None, priority_th: float | None = None, additional_filters: polars.Expr | list[polars.Expr] | None = None) → polars.LazyFrame¶

Return action counts from the sample, optionally classified by propensity/priority thresholds.

Parameters:

groupby_cols (list of str) – Column names to group by.
propensity_th (float, optional) – Propensity threshold for classifying offers.
priority_th (float, optional) – Priority threshold for classifying offers.
additional_filters (pl.Expr or list[pl.Expr], optional) – Extra filters applied to the sample (e.g. channel filter).

Returns:

Aggregated action counts per group, with quality buckets when both thresholds are provided.

Return type:

pl.LazyFrame

get_trend_data(stage: str = 'AvailableActions', scope: Literal['Group', 'Issue', 'Action'] | None = 'Group', additional_filters: polars.Expr | list[polars.Expr] | None = None) → polars.DataFrame¶

Daily trend of unique decisions from a given stage onward.

Parameters:

stage (str, default "AvailableActions") – Starting stage; all stages from this point onward are included.
scope ({"Group", "Issue", "Action"} or None, default "Group") – Optional grouping dimension. If None, returns totals by day.
additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied to the sample.

Returns:

Columns: day, optionally scope, and Decisions.

Return type:

pl.DataFrame

get_filter_component_data(top_n: int, additional_filters: polars.Expr | list[polars.Expr] | None = None) → polars.DataFrame¶

Top-N filter components per stage, ranked by filtered-decision count.

Parameters:

top_n (int) – Maximum number of components to return per stage.
additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied before aggregation.

Returns:

Columns include the stage level, Component Name, and Filtered Decisions.

Return type:

pl.DataFrame

get_component_action_impact(top_n: int = 10, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None) → polars.DataFrame¶

Per-component breakdown of which items are filtered and how many.

For each component, returns the top-N items (at the chosen scope granularity) it filters out. The scope controls whether the breakdown is at Issue, Group, or Action level.

Parameters:

top_n (int, default 10) – Maximum number of items to return per component.
scope (str, default "Action") – Granularity level: "Issue", "Group", or "Action".
additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters to apply before aggregation.

Returns:

Columns include Component Name, StageGroup, scope columns, and Filtered Decisions. Sorted by component then descending count.

Return type:

pl.DataFrame

get_component_drilldown(component_name: str, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None, sort_by: str = 'Filtered Decisions') → polars.DataFrame¶

Deep-dive into a single filter component showing dropped actions and their potential value.

Since scoring columns (Priority, Value, Propensity) are typically null on FILTERED_OUT rows, this method derives the action’s “potential value” by looking up average scores from rows where the same action survives (non-null Priority/Value). This gives the “value of what’s being dropped” perspective.

Parameters:

component_name (str) – The Component Name to drill into.
scope (str, default "Action") – Granularity level: "Issue", "Group", or "Action".
additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters to apply before aggregation.
sort_by (str, default "Filtered Decisions") – Column to sort results by (descending).

Returns:

Columns include scope columns, Filtered Decisions, avg_Priority, avg_Value, avg_Propensity, Component Type (if available).

Return type:

pl.DataFrame

get_ab_test_results() → polars.DataFrame¶

A/B test summary: control vs test counts and control percentage per stage.

Returns:: One row per stage with columns for Control, Test counts and Control Percentage. Rows preserve the canonical AvailableNBADStages order (stages absent from the data are omitted).
Return type:: pl.DataFrame