pdstools.decision_analyzer._aggregates ====================================== .. py:module:: pdstools.decision_analyzer._aggregates .. autoapi-nested-parse:: Aggregation namespace for :class:`DecisionAnalyzer`. Methods in this module are exposed via ``da.aggregates.``. They encapsulate the data-shaping queries that turn the pre-aggregated views into the frames consumed by the plot layer and the Streamlit pages. Classes ------- .. autoapisummary:: pdstools.decision_analyzer._aggregates.Aggregates Module Contents --------------- .. py:class:: Aggregates(da: pdstools.decision_analyzer.DecisionAnalyzer.DecisionAnalyzer) Aggregation queries over the pre-aggregated views. Accessed via :attr:`DecisionAnalyzer.aggregates`. .. py:attribute:: da .. py:method:: aggregate_remaining_per_stage(df: polars.LazyFrame, group_by_columns: list[str], aggregations: list[polars.Expr] | None = None) -> polars.LazyFrame Workhorse function to convert the raw Decision Analyzer data (filter view) to the aggregates remaining per stage, ensuring all stages are represented. .. py:method:: get_distribution_data(stage: str, grouping_levels: str | list[str], additional_filters: polars.Expr | list[polars.Expr] | None = None) -> polars.LazyFrame Distribution of decisions by grouping columns at a given stage. :param stage: Stage to filter on. :type stage: str :param grouping_levels: Column(s) to group by (e.g. ``"Action"`` or ``["Channel", "Action"]``). :type grouping_levels: str or list of str :param additional_filters: Extra filters applied before aggregation. :type additional_filters: pl.Expr or list of pl.Expr, optional :returns: Columns from *grouping_levels* plus ``Decisions``, sorted descending. :rtype: pl.LazyFrame .. py:method:: get_funnel_data(scope: str, additional_filters: polars.Expr | list[polars.Expr] | None = None) -> tuple[polars.LazyFrame, polars.DataFrame, polars.DataFrame] .. py:method:: get_decisions_without_actions_data(additional_filters: polars.Expr | list[polars.Expr] | None = None) -> polars.DataFrame Per-stage count of interactions newly left with no remaining actions. Returns a DataFrame with columns [self.da.level, "decisions_without_actions"], sorted in pipeline order. For each stage X, the value is the number of interactions that lose their final remaining action at stage X. .. py:method:: get_funnel_summary(available_df: polars.LazyFrame, passing_df: polars.DataFrame, additional_filters: polars.Expr | list[polars.Expr] | None = None) -> polars.DataFrame Per-stage summary: Available, Passing, Filtered actions and Decisions. The table matches the funnel chart: it starts with a synthetic "Available Actions" row and excludes the Output stage. :param available_df: First element returned by ``get_funnel_data`` (actions entering each stage). :type available_df: pl.LazyFrame :param passing_df: Second element returned by ``get_funnel_data`` (actions exiting each stage). :type passing_df: pl.DataFrame :param additional_filters: Same filters used when calling ``get_funnel_data``. :type additional_filters: optional :returns: One row per stage in pipeline order with raw counts first, then per-decision averages. :rtype: pl.DataFrame .. py:method:: get_optionality_data(df: polars.LazyFrame | None = None, by_day: bool = False) -> polars.LazyFrame Average number of actions per stage, optionally broken down by day. Computes per-interaction action counts at each stage using ``aggregate_remaining_per_stage``, then aggregates into a histogram. :param df: Input data. Defaults to :attr:`sample`. :type df: pl.LazyFrame, optional :param by_day: If True, include ``"day"`` in the grouping for trend analysis. When False, zero-action rows are injected for stages where some interactions have no remaining actions. :type by_day: bool, default False :rtype: pl.LazyFrame .. py:method:: get_optionality_funnel(df: polars.LazyFrame | None = None) -> polars.LazyFrame Optionality funnel: interaction counts bucketed by available-action count. Buckets action counts into 0–6 and 7+, then counts interactions per stage and bucket. Used by the optionality funnel chart. :param df: Input data. Defaults to :attr:`sample`. :type df: pl.LazyFrame, optional :rtype: pl.LazyFrame .. py:method:: get_action_variation_data(stage: str, color_by: str | None = None) -> polars.LazyFrame Get action variation data, optionally broken down by a categorical dimension. Args: stage: The stage to analyze color_by: Optional categorical column to break down the variation by. Can use "Channel/Direction" to combine Channel and Direction columns. .. py:method:: get_offer_variability_stats(stage: str) -> dict[str, float] Summary statistics for action variation at a stage. :param stage: Stage to analyse. :type stage: str :returns: ``n90`` — number of actions covering 90 % of decisions. ``gini`` — Gini coefficient of decision concentration. :rtype: dict .. py:method:: get_offer_quality(df: polars.LazyFrame, group_by: str | list[str]) -> polars.LazyFrame Cumulative offer-quality breakdown across stages. Takes a filtered-action-counts frame (from :meth:`filtered_action_counts`) and converts it to a remaining-per-stage view, joining in customers that have zero actions so they are counted as well. :param df: Filtered action counts with columns ``no_of_offers``, ``new_models``, ``poor_propensity_offers``, etc. :type df: pl.LazyFrame :param group_by: Columns to group by (e.g. ``["Interaction ID"]``). :type group_by: str or list of str :returns: Per-stage quality classification with boolean flag columns (``has_no_offers``, ``atleast_one_relevant_action``, etc.). :rtype: pl.LazyFrame .. py:method:: filtered_action_counts(groupby_cols: list[str], propensity_th: float | None = None, priority_th: float | None = None, additional_filters: polars.Expr | list[polars.Expr] | None = None) -> polars.LazyFrame Return action counts from the sample, optionally classified by propensity/priority thresholds. :param groupby_cols: Column names to group by. :type groupby_cols: list of str :param propensity_th: Propensity threshold for classifying offers. :type propensity_th: float, optional :param priority_th: Priority threshold for classifying offers. :type priority_th: float, optional :param additional_filters: Extra filters applied to the sample (e.g. channel filter). :type additional_filters: pl.Expr or list[pl.Expr], optional :returns: Aggregated action counts per group, with quality buckets when both thresholds are provided. :rtype: pl.LazyFrame .. py:method:: get_trend_data(stage: str = 'AvailableActions', scope: Literal['Group', 'Issue', 'Action'] | None = 'Group', additional_filters: polars.Expr | list[polars.Expr] | None = None) -> polars.DataFrame Daily trend of unique decisions from a given stage onward. :param stage: Starting stage; all stages from this point onward are included. :type stage: str, default "AvailableActions" :param scope: Optional grouping dimension. If ``None``, returns totals by day. :type scope: {"Group", "Issue", "Action"} or None, default "Group" :param additional_filters: Extra filters applied to the sample. :type additional_filters: pl.Expr or list of pl.Expr, optional :returns: Columns: ``day``, optionally *scope*, and ``Decisions``. :rtype: pl.DataFrame .. py:method:: get_filter_component_data(top_n: int, additional_filters: polars.Expr | list[polars.Expr] | None = None) -> polars.DataFrame Top-N filter components per stage, ranked by filtered-decision count. :param top_n: Maximum number of components to return per stage. :type top_n: int :param additional_filters: Extra filters applied before aggregation. :type additional_filters: pl.Expr or list of pl.Expr, optional :returns: Columns include the stage level, Component Name, and Filtered Decisions. :rtype: pl.DataFrame .. py:method:: get_component_action_impact(top_n: int = 10, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None) -> polars.DataFrame Per-component breakdown of which items are filtered and how many. For each component, returns the top-N items (at the chosen scope granularity) it filters out. The scope controls whether the breakdown is at Issue, Group, or Action level. :param top_n: Maximum number of items to return per component. :type top_n: int, default 10 :param scope: Granularity level: ``"Issue"``, ``"Group"``, or ``"Action"``. :type scope: str, default "Action" :param additional_filters: Extra filters to apply before aggregation. :type additional_filters: pl.Expr or list of pl.Expr, optional :returns: Columns include Component Name, StageGroup, scope columns, and Filtered Decisions. Sorted by component then descending count. :rtype: pl.DataFrame .. py:method:: get_component_drilldown(component_name: str, scope: str = 'Action', additional_filters: polars.Expr | list[polars.Expr] | None = None, sort_by: str = 'Filtered Decisions') -> polars.DataFrame Deep-dive into a single filter component showing dropped actions and their potential value. Since scoring columns (Priority, Value, Propensity) are typically null on FILTERED_OUT rows, this method derives the action's "potential value" by looking up average scores from rows where the same action survives (non-null Priority/Value). This gives the "value of what's being dropped" perspective. :param component_name: The Component Name to drill into. :type component_name: str :param scope: Granularity level: ``"Issue"``, ``"Group"``, or ``"Action"``. :type scope: str, default "Action" :param additional_filters: Extra filters to apply before aggregation. :type additional_filters: pl.Expr or list of pl.Expr, optional :param sort_by: Column to sort results by (descending). :type sort_by: str, default "Filtered Decisions" :returns: Columns include scope columns, Filtered Decisions, avg_Priority, avg_Value, avg_Propensity, Component Type (if available). :rtype: pl.DataFrame .. py:method:: get_ab_test_results() -> polars.DataFrame A/B test summary: control vs test counts and control percentage per stage. :returns: One row per stage with columns for Control, Test counts and Control Percentage. Rows preserve the canonical ``AvailableNBADStages`` order (stages absent from the data are omitted). :rtype: pl.DataFrame