pdstools.adm.Aggregates

Classes

Module Contents

class Aggregates(datamart: pdstools.adm.ADMDatamart.ADMDatamart)
Parameters:

datamart (pdstools.adm.ADMDatamart.ADMDatamart)

datamart
cdh_guidelines
last(*, data: polars.LazyFrame | None = None, table: Literal['model_data', 'predictor_data', 'combined_data'] = 'model_data')

Gets the last snapshot of the given table

Parameters:
  • data (Optional[pl.LazyFrame], optional) – If provided, subsets to just that dataframe, by default None

  • table (Literal['model_data', 'predictor_data', 'combined_data'], optional) – If provided, specifies the table to get data from, by default “model_data”

Returns:

_description_

Return type:

_type_

_combine_data(model_df: polars.LazyFrame | None, predictor_df: polars.LazyFrame | None) polars.LazyFrame | None

Combines the model and predictor tables to the combined_data attribute

Parameters:
  • model_df (pl.LazyFrame) – The model snapshots table

  • predictor_df (pl.LazyFrame) – The predictor binning snapshots table

Returns:

The resulting data, joined on the ModelID column

Return type:

pl.LazyFrame

predictor_performance_pivot(*, query: pdstools.utils.types.QUERY | None = None, active_only: bool = False, by='Name', top_predictors: int | None = None, top_groups: int | None = None) polars.LazyFrame

Creates a pivot table of the predictor performance per ‘group’

Parameters:
  • query (Optional[QUERY], optional) – A query to apply to the data before creating the pivot, by default None

  • by (str, optional) – A group by which to ‘facet’, by default “Name”. If, for instance, the ‘by’ argument is set to ‘Configuration’, each row will be a distinct configuration

  • top_predictors (Optional[int], optional) – Specify the maximum number of predictors, by default None

  • top_groups (Optional[int], optional) – Specify the maximum number of ‘groups’ specified in the ‘by’ argument, by default None

  • active_only (bool)

Returns:

A LazyFrame with a column for each predictor, and a row for each ‘group’. The values represent the weighted performance for that predictor

Return type:

pl.LazyFrame

model_summary(by: str = 'Name', query: pdstools.utils.types.QUERY | None = None) polars.LazyFrame

Generate a summary of statistic for each model (based on model ID)

If you want to generate statistics at a model name or treatment level, specify this in the ‘by’ column.

Parameters:
  • by (str, optional) – The column to define the ‘counts’ for, by default “ModelID” Must be part of the context keys in the ADMDatamart class

  • query (Optional[QUERY], optional) – A query to apply to the data before summarization, by default None

Returns:

A LazyFrame, with one row for each context key combination

Return type:

pl.LazyFrame

predictor_counts(*, facet: str = 'Configuration', by: str = 'Type', query: pdstools.utils.types.QUERY | None = None) polars.LazyFrame

Returns the count of each predictor grouped by a certain column

Parameters:
  • by (str, optional) – The column to group the data by, by default “Type”

  • query (Optional[QUERY], optional) – A query to apply to the data, by default None

  • facet (str)

Returns:

A LazyFrame, with one row per predictor and ‘by’ combo

Return type:

pl.LazyFrame

static _top_n(df: polars.DataFrame, top_n: int, metric: str = 'PredictorPerformance', facets: list | None = None)

Subsets DataFrame to contain only top_n predictors.

Parameters:
  • df (pl.DataFrame) – Table to subset

  • top_n (int) – Number of top predictors

  • metric (str) – Metric to use for comparing predictors

  • facets (list) – Subsets top_n predictors over facets. Seperate top predictors for each facet

Returns:

Subsetted dataframe

Return type:

pl.DataFrame

_adm_model_summary(by_period: str | None, by_channel: bool, custom_channels: Dict[str, str] | None = None) polars.LazyFrame
Parameters:
  • by_period (Optional[str])

  • by_channel (bool)

  • custom_channels (Optional[Dict[str, str]])

Return type:

polars.LazyFrame

_summarize_meta_info(grouping: List[str] | None, model_data: polars.LazyFrame) polars.LazyFrame
Parameters:
  • grouping (Optional[List[str]])

  • model_data (polars.LazyFrame)

Return type:

polars.LazyFrame

_summarize_model_analytics(grouping: List[str] | None, model_data: polars.LazyFrame) polars.LazyFrame
Parameters:
  • grouping (Optional[List[str]])

  • model_data (polars.LazyFrame)

Return type:

polars.LazyFrame

_summarize_action_analytics(grouping: List[str] | None, model_data: polars.LazyFrame) polars.LazyFrame
Parameters:
  • grouping (Optional[List[str]])

  • model_data (polars.LazyFrame)

Return type:

polars.LazyFrame

_summarize_model_usage(grouping: List[str] | None, model_data: polars.LazyFrame, standard_configurations: List[str]) polars.LazyFrame
Parameters:
  • grouping (Optional[List[str]])

  • model_data (polars.LazyFrame)

  • standard_configurations (List[str])

Return type:

polars.LazyFrame

summary_by_channel(by_period: str | None = None, custom_channels: Dict[str, str] | None = None) polars.LazyFrame

Summarize ADM models per channel

Parameters:
Returns:

Dataframe with summary per channel (and optionally a period)

Return type:

pl.LazyFrame

summary_by_configuration() polars.DataFrame

Generates a summary of the ADM model configurations.

Returns:

A Polars DataFrame containing the configuration summary.

Return type:

pl.DataFrame

predictors_overview() polars.DataFrame | None

Generate a summary of the last snapshot of predictor data.

This method creates a summary of predictor data by joining the last snapshots of predictor_data and model_data, then performing various aggregations and calculations. It excludes the “Classifier” predictor from the analysis.

Returns:

A Polars DataFrame containing the predictor summary if successful, None if the required data is not available.

Return type:

pl.DataFrame or None

overall_summary(by_period: str = None) polars.LazyFrame

Overall ADM models summary. Only valid data is included.

Parameters:
Returns:

Summary across all valid ADM models as a dataframe

Return type:

pl.LazyFrame