pdstools.decision_analyzer._aggregates¶
Aggregation namespace for DecisionAnalyzer.
Methods in this module are exposed via da.aggregates.<method>. They
encapsulate the data-shaping queries that turn the pre-aggregated views
into the frames consumed by the plot layer and the Streamlit pages.
Classes¶
Aggregation queries over the pre-aggregated views. |
Module Contents¶
- class Aggregates(da: pdstools.decision_analyzer.DecisionAnalyzer.DecisionAnalyzer)¶
Aggregation queries over the pre-aggregated views.
Accessed via
DecisionAnalyzer.aggregates.- Parameters:
da (pdstools.decision_analyzer.DecisionAnalyzer.DecisionAnalyzer)
- da¶
- aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: list[str], aggregations: list[polars.Expr] | None = None) polars.LazyFrame¶
Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented.
- get_distribution_data(stage: str, grouping_levels: str | list[str], additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame¶
Distribution of decisions by grouping columns at a given stage.
- Parameters:
- Returns:
Columns from grouping_levels plus
Decisions, sorted descending.- Return type:
pl.LazyFrame
- get_funnel_data(scope: str, additional_filters: polars.Expr | list[polars.Expr] | None = None) tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame]¶
- get_decisions_without_actions_data(additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
Per-stage count of interactions newly left with no remaining actions.
Returns a DataFrame with columns [self.da.level, “decisions_without_actions”], sorted in pipeline order. For each stage X, the value is the number of interactions that lose their final remaining action at stage X.
- Parameters:
additional_filters (polars.Expr | list[polars.Expr] | None)
- Return type:
polars.DataFrame
- get_funnel_summary(available_df: polars.LazyFrame, passing_df: polars.DataFrame, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
Per-stage summary: Available, Passing, Filtered actions and Decisions.
The table matches the funnel chart: it starts with a synthetic “Available Actions” row and excludes the Output stage.
- Parameters:
available_df (pl.LazyFrame) – First element returned by
get_funnel_data(actions entering each stage).passing_df (pl.DataFrame) – Second element returned by
get_funnel_data(actions exiting each stage).additional_filters (optional) – Same filters used when calling
get_funnel_data.
- Returns:
One row per stage in pipeline order with raw counts first, then per-decision averages.
- Return type:
pl.DataFrame
- get_optionality_data(df: polars.LazyFrame | None = None, by_day: bool = False) polars.LazyFrame¶
Average number of actions per stage, optionally broken down by day.
Computes per-interaction action counts at each stage using
aggregate_remaining_per_stage, then aggregates into a histogram.- Parameters:
df (pl.LazyFrame, optional) – Input data. Defaults to
sample.by_day (bool, default False) – If True, include
"day"in the grouping for trend analysis. When False, zero-action rows are injected for stages where some interactions have no remaining actions.
- Return type:
pl.LazyFrame
- get_optionality_funnel(df: polars.LazyFrame | None = None) polars.LazyFrame¶
Optionality funnel: interaction counts bucketed by available-action count.
Buckets action counts into 0–6 and 7+, then counts interactions per stage and bucket. Used by the optionality funnel chart.
- Parameters:
df (pl.LazyFrame, optional) – Input data. Defaults to
sample.- Return type:
pl.LazyFrame
- get_action_variation_data(stage: str, color_by: str | None = None) polars.LazyFrame¶
Get action variation data, optionally broken down by a categorical dimension.
- Args:
stage: The stage to analyze color_by: Optional categorical column to break down the variation by.
Can use “Channel/Direction” to combine Channel and Direction columns.
- get_offer_variability_stats(stage: str) dict[str, float]¶
Summary statistics for action variation at a stage.
- get_offer_quality(df: polars.LazyFrame, group_by: str | list[str]) polars.LazyFrame¶
Cumulative offer-quality breakdown across stages.
Takes a filtered-action-counts frame (from
filtered_action_counts()) and converts it to a remaining-per-stage view, joining in customers that have zero actions so they are counted as well.- Parameters:
- Returns:
Per-stage quality classification with boolean flag columns (
has_no_offers,atleast_one_relevant_action, etc.).- Return type:
pl.LazyFrame
- filtered_action_counts(groupby_cols: list[str], propensity_th: float | None = None, priority_th: float | None = None, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.LazyFrame¶
Return action counts from the sample, optionally classified by propensity/priority thresholds.
- Parameters:
- Returns:
Aggregated action counts per group, with quality buckets when both thresholds are provided.
- Return type:
pl.LazyFrame
- get_trend_data(stage: str = 'AvailableActions', scope: Literal['Group', 'Issue', 'Action'] | None = 'Group', additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
Daily trend of unique decisions from a given stage onward.
- Parameters:
stage (str, default "AvailableActions") – Starting stage; all stages from this point onward are included.
scope ({"Group", "Issue", "Action"} or None, default "Group") – Optional grouping dimension. If
None, returns totals by day.additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters applied to the sample.
- Returns:
Columns:
day, optionally scope, andDecisions.- Return type:
pl.DataFrame
- get_filter_component_data(top_n: int, additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
Top-N filter components per stage, ranked by filtered-decision count.
- get_component_action_impact(top_n: int = 10, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None) polars.DataFrame¶
Per-component breakdown of which items are filtered and how many.
For each component, returns the top-N items (at the chosen scope granularity) it filters out. The scope controls whether the breakdown is at Issue, Group, or Action level.
- Parameters:
- Returns:
Columns include Component Name, StageGroup, scope columns, and Filtered Decisions. Sorted by component then descending count.
- Return type:
pl.DataFrame
- get_component_drilldown(component_name: str, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None, sort_by: str = 'Filtered Decisions') polars.DataFrame¶
Deep-dive into a single filter component showing dropped actions and their potential value.
Since scoring columns (Priority, Value, Propensity) are typically null on FILTERED_OUT rows, this method derives the action’s “potential value” by looking up average scores from rows where the same action survives (non-null Priority/Value). This gives the “value of what’s being dropped” perspective.
- Parameters:
component_name (str) – The Component Name to drill into.
scope (str, default "Action") – Granularity level:
"Issue","Group", or"Action".additional_filters (pl.Expr or list of pl.Expr, optional) – Extra filters to apply before aggregation.
sort_by (str, default "Filtered Decisions") – Column to sort results by (descending).
- Returns:
Columns include scope columns, Filtered Decisions, avg_Priority, avg_Value, avg_Propensity, Component Type (if available).
- Return type:
pl.DataFrame
- get_ab_test_results() polars.DataFrame¶
A/B test summary: control vs test counts and control percentage per stage.
- Returns:
One row per stage with columns for Control, Test counts and Control Percentage. Rows preserve the canonical
AvailableNBADStagesorder (stages absent from the data are omitted).- Return type:
pl.DataFrame