pdstools.utils.report_utils._polars_helpers
===========================================

.. py:module:: pdstools.utils.report_utils._polars_helpers

.. autoapi-nested-parse::

   Small Polars helpers and aggregations used by Quarto reports.


Functions
---------

.. autoapisummary::

   pdstools.utils.report_utils._polars_helpers.polars_col_exists
   pdstools.utils.report_utils._polars_helpers.polars_subset_to_existing_cols
   pdstools.utils.report_utils._polars_helpers.n_unique_values
   pdstools.utils.report_utils._polars_helpers.max_by_hierarchy
   pdstools.utils.report_utils._polars_helpers.avg_by_hierarchy
   pdstools.utils.report_utils._polars_helpers.sample_values
   pdstools.utils.report_utils._polars_helpers.gains_table


Module Contents
---------------

.. py:function:: polars_col_exists(df, col)

.. py:function:: polars_subset_to_existing_cols(all_columns, cols)

.. py:function:: n_unique_values(dm, all_dm_cols, fld)

.. py:function:: max_by_hierarchy(dm, all_dm_cols, fld, grouping)

.. py:function:: avg_by_hierarchy(dm, all_dm_cols, fld, grouping)

.. py:function:: sample_values(dm, all_dm_cols, fld, n=6)

.. py:function:: gains_table(df: polars.LazyFrame | polars.DataFrame, value: str, index: str | None = None, by: str | list[str] | None = None) -> polars.DataFrame

   Calculate cumulative gains for visualization.

   Computes cumulative distribution of a value metric, sorted by the ratio
   of value to index (or by value alone if no index). Used for gains charts
   to show model response skewness.

   :param df: Input data containing the value and optional index columns
   :type df: pl.LazyFrame | pl.DataFrame
   :param value: Column name containing the metric to compute gains for (e.g., "ResponseCount")
   :type value: str
   :param index: Column name to normalize by (e.g., population size). If None, uses row count.
   :type index: str, optional
   :param by: Column(s) to group by for separate gain curves. If None, computes single curve.
   :type by: str | list[str], optional

   :returns: DataFrame with columns:
             - cum_x: Cumulative proportion of index (or models)
             - cum_y: Cumulative proportion of value
             - by columns: if `by` is specified
   :rtype: pl.DataFrame

   .. rubric:: Examples

   >>> # Single gains curve for response count
   >>> gains = gains_table(df, value="ResponseCount")

   >>> # Gains curves by channel, normalized by population
   >>> gains = gains_table(df, value="Positives", index="Population", by="Channel")