pdstools.adm.Aggregates¶

Classes¶

Aggregates

Module Contents¶

class Aggregates(datamart: pdstools.adm.ADMDatamart.ADMDatamart)¶

Parameters:: datamart (pdstools.adm.ADMDatamart.ADMDatamart)

datamart¶

cdh_guidelines¶

last(*, data: polars.LazyFrame | None = None, table: Literal['model_data', 'predictor_data', 'combined_data'] = 'model_data') → polars.LazyFrame¶

Gets the last snapshot of the given table

This method filters the data to include only the rows from the most recent snapshot time.

Parameters:

data (Optional[pl.LazyFrame], optional) – If provided, subsets to just that dataframe, by default None
table (Literal['model_data', 'predictor_data', 'combined_data'], optional) – If provided, specifies the table to get data from, by default “model_data”

Returns:

A LazyFrame containing only the rows from the most recent snapshot time

Return type:

pl.LazyFrame

_combine_data(model_df: polars.LazyFrame | None, predictor_df: polars.LazyFrame | None) → polars.LazyFrame | None¶

Combines the model and predictor tables to the combined_data attribute

Parameters:

model_df (pl.LazyFrame) – The model snapshots table
predictor_df (pl.LazyFrame) – The predictor binning snapshots table

Returns:

The resulting data, joined on the ModelID column

Return type:

pl.LazyFrame

predictor_performance_pivot(*, query: pdstools.utils.types.QUERY | None = None, active_only: bool = False, by='Name', top_predictors: int | None = None, top_groups: int | None = None) → polars.LazyFrame¶

Creates a pivot table of the predictor performance per ‘group’

Parameters:

query (Optional[QUERY], optional) – A query to apply to the data before creating the pivot, by default None
by (str, optional) – A group by which to ‘facet’, by default “Name”. If, for instance, the ‘by’ argument is set to ‘Configuration’, each row will be a distinct configuration
top_predictors (Optional[int], optional) – Specify the maximum number of predictors, by default None
top_groups (Optional[int], optional) – Specify the maximum number of ‘groups’ specified in the ‘by’ argument, by default None
active_only (bool)

Returns:

A LazyFrame with a column for each predictor, and a row for each ‘group’. The values represent the weighted performance for that predictor

Return type:

pl.LazyFrame

model_summary(by: str = 'Name', query: pdstools.utils.types.QUERY | None = None) → polars.LazyFrame¶

Generate a summary of statistic for each model (based on model ID)

If you want to generate statistics at a model name or treatment level, specify this in the ‘by’ column.

Parameters:

by (str, optional) – The column to define the ‘counts’ for, by default “ModelID” Must be part of the context keys in the ADMDatamart class
query (Optional[QUERY], optional) – A query to apply to the data before summarization, by default None

Returns:

A LazyFrame, with one row for each context key combination

Return type:

pl.LazyFrame

predictor_counts(*, facet: str = 'Configuration', by: str = 'Type', query: pdstools.utils.types.QUERY | None = None) → polars.LazyFrame¶

Returns the count of each predictor grouped by a certain column

Parameters:

facet (str, optional) – The column to use as a secondary grouping dimension, by default “Configuration”
by (str, optional) – The column to group the data by, by default “Type”
query (Optional[QUERY], optional) – A query to apply to the data, by default None

Returns:

A LazyFrame with one row per predictor and ‘by’ combination, containing: - Name - The action name - EntryType - The entry type (Active, Inactive, etc.) - by - The column specified in the ‘by’ parameter - facet - The column specified in the ‘facet’ parameter - PredictorCount - The number of unique predictors for this combination

Return type:

pl.LazyFrame

static _top_n(df: polars.DataFrame, top_n: int, metric: str = 'PredictorPerformance', facets: list | None = None)¶

Subsets DataFrame to contain only top_n predictors.

Parameters:

df (pl.DataFrame) – Table to subset
top_n (int) – Number of top predictors
metric (str) – Metric to use for comparing predictors
facets (list) – Subsets top_n predictors over facets. Seperate top predictors for each facet

Returns:

Subsetted dataframe

Return type:

pl.DataFrame

_adm_model_summary(*, query: pdstools.utils.types.QUERY | None = None, by_period: str | None, by_channel: bool = False, debug: bool = False, custom_channels: Dict[str, str] | None = None) → polars.LazyFrame¶

Parameters:

query (Optional[pdstools.utils.types.QUERY])
by_period (Optional[str])
by_channel (bool)
debug (bool)
custom_channels (Optional[Dict[str, str]])

Return type:

polars.LazyFrame

_summarize_meta_info(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) → polars.LazyFrame¶

Parameters:

grouping (Optional[List[str]])
model_data (polars.LazyFrame)
debug (bool)

Return type:

polars.LazyFrame

_summarize_model_analytics(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) → polars.LazyFrame¶

Parameters:

grouping (Optional[List[str]])
model_data (polars.LazyFrame)
debug (bool)

Return type:

polars.LazyFrame

_summarize_action_analytics(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) → polars.LazyFrame¶

Parameters:

grouping (Optional[List[str]])
model_data (polars.LazyFrame)
debug (bool)

Return type:

polars.LazyFrame

_summarize_model_usage(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) → polars.LazyFrame¶

Parameters:

grouping (Optional[List[str]])
model_data (polars.LazyFrame)
debug (bool)

Return type:

polars.LazyFrame

Summarize ADM models per channel

Parameters:

query (Optional[QUERY], optional) – A query to apply to the data, by default None, so no filtering applied
start_date (datetime.datetime, optional) – Start date of the summary period. If None (default) uses the end date minus the window, or if both absent, the earliest date in the data
end_date (datetime.datetime, optional) – End date of the summary period. If None (default) uses the start date plus the window, or if both absent, the latest date in the data
window (int or datetime.timedelta, optional) – Number of days to use for the summary period or an explicit timedelta. If None (default) uses the whole period. Can’t be given if start and end date are also given.
by_period (str, optional) – Optional additional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example “1mo”, “1w”, “1d” for calendar month, week day. Defaults to None.
custom_channels (Dict[str, str], optional) – Optional dictionary mapping custom channel names to standard channel groups. Defaults to None.
debug (bool, optional) – If True, enables debug mode for additional logging or outputs. Defaults to False.

Returns:

Dataframe with summary per channel (and optionally a period) with the following fields:

Channel Identification: - Channel - The channel name - Direction - The direction (e.g., Inbound, Outbound) - ChannelDirection - Combined Channel/Direction (e.g., “Web/Inbound”) - ChannelDirectionGroup - Standardized channel group with direction (e.g., “Web/Inbound”)

Time and Configuration Fields: - DateRange Min - The minimum date in the summary time range - DateRange Max - The maximum date in the summary time range - Duration - The duration in seconds between the minimum and maximum snapshot times - Configuration - A comma-separated list of model configuration names

Performance Metrics: - Positives - The sum of positive responses across all models in the channel - Responses - The sum of all responses across all models in the channel - Performance - The weighted average performance across all models in the channel (50-100) - CTR - Click-through rate (Positives / Responses) in the channel - isValid - Boolean indicating if the channel has sufficient data (at least 200 positives and 1000 responses)

Action Statistics: - Actions - The total number of unique actions in the channel - Used Actions - The number of unique actions that have been used (have responses) - New Actions - The number of new actions introduced in the period - Issues - The number of unique issues - Groups - The number of unique issue/group combinations

Treatment Statistics: - Treatments - The total number of unique treatments - Used Treatments - The number of unique treatments

Omnichannel Metrics: - OmniChannel - The overlap of actions with other channels (measure of Omni Channel capability)

Technology Usage Indicators: - usesNBAD - Boolean indicating whether any standard NBAD configurations are used - usesAGB - Boolean indicating whether any Adaptive Generic Boosting (AGB) models are used

Return type:

pl.LazyFrame

summary_by_configuration() → polars.LazyFrame¶

Generates a summary of the ADM model configurations.

This method provides an overview of model configurations, including information about the number of models, actions, treatments, and performance metrics.

Returns:

A Polars LazyFrame containing the configuration summary with the following fields:

Configuration Information: - Configuration - The name of the model configuration - Channel - The channel name (if available in context keys) - Direction - The direction (if available in context keys)

Model Information: - ModelID - The number of unique model IDs for this configuration

Action Statistics: - Actions - The number of unique actions in this configuration - Unique Treatments - The number of unique treatments (if available) - Used for (Issues) - A comma-separated list of issues this configuration is used for (if available)

Performance Metrics: - ResponseCount - The total number of responses for this configuration - Positives - The total number of positive responses for this configuration - ModelsPerAction - The ratio of models to actions (models per action) - Performance - The weighted average model performance

Return type:

pl.LazyFrame

predictors_global_overview() → polars.LazyFrame¶

Generate a global overview of all predictors across all models.

This method provides a summary of predictor performance and characteristics across all models, including the number of responses, positives, and performance metrics.

Returns:

A Polars LazyFrame containing the global predictor overview with the following fields:

PredictorName - The name of the predictor
Response Count Min/Max - The total number of responses for this predictor
Positives - The total number of positive responses for this predictor
Min, Mean, Median, Max - The min, mean, median and max performance of the predictor (AUC)

Return type:

pl.LazyFrame

predictors_overview(model_id: str | None = None, additional_aggregations: list | None = None) → polars.LazyFrame | None¶

Generate a summary of the last snapshot of predictor data.

This method provides an overview of predictor performance and characteristics from the most recent snapshot, either for all models or for a specific model.

Parameters:

model_id (Optional[str], optional) – If provided, filters the data to include only predictors for the specified model ID. If None (default), includes predictors for all models.
additional_aggregations (Optional[list], optional) – Additional aggregation expressions to include in the result. These will be added to the default aggregations.

Returns:

A Polars LazyFrame containing the predictor summary with the following fields:

Identification: - ModelID - The model ID (only if model_id parameter is None) - PredictorName - The name of the predictor

Status and Type: - EntryType - The entry type (Active, Inactive, etc.) - isActive - Boolean indicating if the predictor is active - Type - The predictor type - GroupIndex - The group index of the predictor

Performance Metrics: - Responses - The number of responses for this predictor - Positives - The number of positive responses for this predictor - Univariate Performance - The univariate performance of the predictor (AUC)

Binning Information: - Bins - The number of bins for this predictor - Missing % - The percentage of responses in the MISSING bin - Residual % - The percentage of responses in the RESIDUAL bin

Returns None if the required data is not available or an error is encountered.

Return type:

pl.LazyFrame or None

Overall ADM models summary. Only valid data is included.

Parameters:

start_date (datetime.datetime, optional) – Start date of the summary period. If None (default) uses the end date minus the window, or if both absent, the earliest date in the data
end_date (datetime.datetime, optional) – End date of the summary period. If None (default) uses the start date plus the window, or if both absent, the latest date in the data
window (int or datetime.timedelta, optional) – Number of days to use for the summary period or an explicit timedelta. If None (default) uses the whole period. Can’t be given if start and end date are also given.
by_period (str, optional) – Optional additional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example “1mo”, “1w”, “1d” for calendar month, week day. Defaults to None.
debug (bool, optional) – If True, enables debug mode for additional logging or outputs. Defaults to False.

Returns:

Summary across all valid ADM models as a dataframe with the following fields:

Time and Configuration Fields: - DateRange Min - The minimum date in the snapshot time range - DateRange Max - The maximum date in the snapshot time range - Duration - The duration in seconds between the minimum and maximum snapshot times - Configuration - A comma-separated list of unique model configurations

Performance Metrics: - Positives Inbound - The sum of positive responses across all models in the inbound channels - Positives Outbound - The sum of positive responses across all models in the outbound channels - Responses Inbound - The sum of all responses across all models in the inbound channels - Responses Outbound - The sum of all responses across all models in the outbound channels - Performance - The weighted average performance across all models (50-100)

Action Statistics: - Actions - The total number of unique actions - Used Actions - The number of unique actions that have been used (have responses) - New Actions - The number of new actions introduced in the period - Issues - The number of unique issues - Groups - The number of unique issue/group combinations

Treatment Statistics: - Treatments - The total number of unique treatments - Used Treatments - The number of unique treatments that have been used

Channel Statistics: - Number of Valid Channels - The count of valid channels (channels with sufficient data) - Minimum Channel Performance - The performance of the channel with lowest performance - Channel with Minimum Performance - The channel/direction group with the lowest performance - OmniChannel - The average overlap of actions across channels (measure of Omni Channel capability)

Technology Usage Indicators: - usesNBAD - Boolean indicating whether standard NBAD configurations are used - usesAGB - Boolean indicating whether any Adaptive Gradient Boosting (AGB) models are used

Note: A channel is considered “valid” if it has at least 200 positives and 1000 responses

Return type:

pl.LazyFrame