pdstools.utils.report_utils._polars_helpers¶
Small Polars helpers and aggregations used by Quarto reports.
Functions¶
|
|
|
|
|
|
|
|
|
|
|
|
|
Calculate cumulative gains for visualization. |
Module Contents¶
- polars_col_exists(df, col)¶
- polars_subset_to_existing_cols(all_columns, cols)¶
- n_unique_values(dm, all_dm_cols, fld)¶
- max_by_hierarchy(dm, all_dm_cols, fld, grouping)¶
- avg_by_hierarchy(dm, all_dm_cols, fld, grouping)¶
- sample_values(dm, all_dm_cols, fld, n=6)¶
- gains_table(df: polars.LazyFrame | polars.DataFrame, value: str, index: str | None = None, by: str | list[str] | None = None) polars.DataFrame¶
Calculate cumulative gains for visualization.
Computes cumulative distribution of a value metric, sorted by the ratio of value to index (or by value alone if no index). Used for gains charts to show model response skewness.
- Parameters:
df (pl.LazyFrame | pl.DataFrame) – Input data containing the value and optional index columns
value (str) – Column name containing the metric to compute gains for (e.g., “ResponseCount”)
index (str, optional) – Column name to normalize by (e.g., population size). If None, uses row count.
by (str | list[str], optional) – Column(s) to group by for separate gain curves. If None, computes single curve.
- Returns:
DataFrame with columns: - cum_x: Cumulative proportion of index (or models) - cum_y: Cumulative proportion of value - by columns: if by is specified
- Return type:
pl.DataFrame
Examples
>>> # Single gains curve for response count >>> gains = gains_table(df, value="ResponseCount")
>>> # Gains curves by channel, normalized by population >>> gains = gains_table(df, value="Positives", index="Population", by="Channel")