pdstools.adm.Aggregates¶
Classes¶
Module Contents¶
- class Aggregates(datamart: pdstools.adm.ADMDatamart.ADMDatamart)¶
- Parameters:
datamart (pdstools.adm.ADMDatamart.ADMDatamart)
- datamart¶
- cdh_guidelines¶
- last(*, data: polars.LazyFrame | None = None, table: Literal['model_data', 'predictor_data', 'combined_data'] = 'model_data') polars.LazyFrame ¶
Gets the last snapshot of the given table
This method filters the data to include only the rows from the most recent snapshot time.
- Parameters:
data (Optional[pl.LazyFrame], optional) – If provided, subsets to just that dataframe, by default None
table (Literal['model_data', 'predictor_data', 'combined_data'], optional) – If provided, specifies the table to get data from, by default “model_data”
- Returns:
A LazyFrame containing only the rows from the most recent snapshot time
- Return type:
pl.LazyFrame
- _combine_data(model_df: polars.LazyFrame | None, predictor_df: polars.LazyFrame | None) polars.LazyFrame | None ¶
Combines the model and predictor tables to the combined_data attribute
- Parameters:
model_df (pl.LazyFrame) – The model snapshots table
predictor_df (pl.LazyFrame) – The predictor binning snapshots table
- Returns:
The resulting data, joined on the ModelID column
- Return type:
pl.LazyFrame
- predictor_performance_pivot(*, query: pdstools.utils.types.QUERY | None = None, active_only: bool = False, by='Name', top_predictors: int | None = None, top_groups: int | None = None) polars.LazyFrame ¶
Creates a pivot table of the predictor performance per ‘group’
- Parameters:
query (Optional[QUERY], optional) – A query to apply to the data before creating the pivot, by default None
by (str, optional) – A group by which to ‘facet’, by default “Name”. If, for instance, the ‘by’ argument is set to ‘Configuration’, each row will be a distinct configuration
top_predictors (Optional[int], optional) – Specify the maximum number of predictors, by default None
top_groups (Optional[int], optional) – Specify the maximum number of ‘groups’ specified in the ‘by’ argument, by default None
active_only (bool)
- Returns:
A LazyFrame with a column for each predictor, and a row for each ‘group’. The values represent the weighted performance for that predictor
- Return type:
pl.LazyFrame
- model_summary(by: str = 'Name', query: pdstools.utils.types.QUERY | None = None) polars.LazyFrame ¶
Generate a summary of statistic for each model (based on model ID)
If you want to generate statistics at a model name or treatment level, specify this in the ‘by’ column.
- Parameters:
by (str, optional) – The column to define the ‘counts’ for, by default “ModelID” Must be part of the context keys in the ADMDatamart class
query (Optional[QUERY], optional) – A query to apply to the data before summarization, by default None
- Returns:
A LazyFrame, with one row for each context key combination
- Return type:
pl.LazyFrame
- predictor_counts(*, facet: str = 'Configuration', by: str = 'Type', query: pdstools.utils.types.QUERY | None = None) polars.LazyFrame ¶
Returns the count of each predictor grouped by a certain column
- Parameters:
- Returns:
A LazyFrame with one row per predictor and ‘by’ combination, containing: - Name - The action name - EntryType - The entry type (Active, Inactive, etc.) - by - The column specified in the ‘by’ parameter - facet - The column specified in the ‘facet’ parameter - PredictorCount - The number of unique predictors for this combination
- Return type:
pl.LazyFrame
- static _top_n(df: polars.DataFrame, top_n: int, metric: str = 'PredictorPerformance', facets: list | None = None)¶
Subsets DataFrame to contain only top_n predictors.
- _adm_model_summary(*, query: pdstools.utils.types.QUERY | None = None, by_period: str | None, by_channel: bool = False, debug: bool = False, custom_channels: Dict[str, str] | None = None) polars.LazyFrame ¶
- _summarize_meta_info(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) polars.LazyFrame ¶
- _summarize_model_analytics(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) polars.LazyFrame ¶
- _summarize_action_analytics(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) polars.LazyFrame ¶
- _summarize_model_usage(grouping: List[str] | None, model_data: polars.LazyFrame, standard_configurations: List[str], debug: bool) polars.LazyFrame ¶
- summary_by_channel(*, start_date: datetime.datetime | None = None, end_date: datetime.datetime | None = None, window: int | datetime.timedelta | None = None, by_period: str | None = None, custom_channels: Dict[str, str] | None = None, debug: bool = False) polars.LazyFrame ¶
Summarize ADM models per channel
- Parameters:
start_date (datetime.datetime, optional) – Start date of the summary period. If None (default) uses the end date minus the window, or if both absent, the earliest date in the data
end_date (datetime.datetime, optional) – End date of the summary period. If None (default) uses the start date plus the window, or if both absent, the latest date in the data
window (int or datetime.timedelta, optional) – Number of days to use for the summary period or an explicit timedelta. If None (default) uses the whole period. Can’t be given if start and end date are also given.
by_period (str, optional) – Optional additional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example “1mo”, “1w”, “1d” for calendar month, week day. Defaults to None.
custom_channels (Dict[str, str], optional) – Optional dictionary mapping custom channel names to standard channel groups. Defaults to None.
debug (bool, optional) – If True, enables debug mode for additional logging or outputs. Defaults to False.
- Returns:
Dataframe with summary per channel (and optionally a period) with the following fields:
Channel Identification: - Channel - The channel name - Direction - The direction (e.g., Inbound, Outbound) - ChannelDirection - Combined Channel/Direction (e.g., “Web/Inbound”) - ChannelDirectionGroup - Standardized channel group with direction (e.g., “Web/Inbound”)
Time and Configuration Fields: - DateRange Min - The minimum date in the summary time range - DateRange Max - The maximum date in the summary time range - Duration - The duration in seconds between the minimum and maximum snapshot times - Configuration - A comma-separated list of model configuration names
Performance Metrics: - Positives - The sum of positive responses across all models in the channel - Responses - The sum of all responses across all models in the channel - Performance - The weighted average performance across all models in the channel (50-100) - CTR - Click-through rate (Positives / Responses) in the channel - isValid - Boolean indicating if the channel has sufficient data (at least 200 positives and 1000 responses)
Action Statistics: - Actions - The total number of unique actions in the channel - Used Actions - The number of unique actions that have been used (have responses) - New Actions - The number of new actions introduced in the period - Issues - The number of unique issues - Groups - The number of unique issue/group combinations
Treatment Statistics: - Treatments - The total number of unique treatments - Used Treatments - The number of unique treatments
Omnichannel Metrics: - OmniChannel - The overlap of actions with other channels (measure of Omni Channel capability)
Technology Usage Indicators: - usesNBAD - Boolean indicating whether any standard NBAD configurations are used - usesAGB - Boolean indicating whether any Adaptive Generic Boosting (AGB) models are used
- Return type:
pl.LazyFrame
- summary_by_configuration() polars.DataFrame ¶
Generates a summary of the ADM model configurations.
This method provides an overview of model configurations, including information about the number of models, actions, treatments, and performance metrics.
- Returns:
A Polars DataFrame containing the configuration summary with the following fields:
Configuration Information: - Configuration - The name of the model configuration - Channel - The channel name (if available in context keys) - Direction - The direction (if available in context keys)
Model Information: - AGB - Indicates if Adaptive Gradient Boosting is used (“Yes”, “No”, or “Unknown”) - ModelID - The number of unique model IDs for this configuration
Action Statistics: - Actions - The number of unique actions in this configuration - Unique Treatments - The number of unique treatments (if available) - Used for (Issues) - A comma-separated list of issues this configuration is used for (if available)
Performance Metrics: - ResponseCount - The total number of responses for this configuration - Positives - The total number of positive responses for this configuration - ModelsPerAction - The ratio of models to actions (models per action)
- Return type:
pl.DataFrame
- predictors_overview(model_id: str | None = None, additional_aggregations: list | None = None) polars.DataFrame | None ¶
Generate a summary of the last snapshot of predictor data.
This method provides an overview of predictor performance and characteristics from the most recent snapshot, either for all models or for a specific model.
- Parameters:
model_id (Optional[str], optional) – If provided, filters the data to include only predictors for the specified model ID. If None (default), includes predictors for all models.
additional_aggregations (Optional[list], optional) – Additional aggregation expressions to include in the result. These will be added to the default aggregations.
- Returns:
A Polars DataFrame containing the predictor summary with the following fields:
Identification: - ModelID - The model ID (only if model_id parameter is None) - PredictorName - The name of the predictor
Status and Type: - EntryType - The entry type (Active, Inactive, etc.) - isActive - Boolean indicating if the predictor is active - Type - The predictor type - GroupIndex - The group index of the predictor
Performance Metrics: - Responses - The number of responses for this predictor - Positives - The number of positive responses for this predictor - Univariate Performance - The univariate performance of the predictor (AUC)
Binning Information: - Bins - The number of bins for this predictor - Missing % - The percentage of responses in the MISSING bin - Residual % - The percentage of responses in the RESIDUAL bin
Returns None if the required data is not available or an error is encountered.
- Return type:
pl.DataFrame or None
- overall_summary(*, start_date: datetime.datetime | None = None, end_date: datetime.datetime | None = None, window: int | datetime.timedelta | None = None, by_period: str | None = None, debug: bool = False) polars.LazyFrame ¶
Overall ADM models summary. Only valid data is included.
- Parameters:
start_date (datetime.datetime, optional) – Start date of the summary period. If None (default) uses the end date minus the window, or if both absent, the earliest date in the data
end_date (datetime.datetime, optional) – End date of the summary period. If None (default) uses the start date plus the window, or if both absent, the latest date in the data
window (int or datetime.timedelta, optional) – Number of days to use for the summary period or an explicit timedelta. If None (default) uses the whole period. Can’t be given if start and end date are also given.
by_period (str, optional) – Optional additional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example “1mo”, “1w”, “1d” for calendar month, week day. Defaults to None.
debug (bool, optional) – If True, enables debug mode for additional logging or outputs. Defaults to False.
- Returns:
Summary across all valid ADM models as a dataframe with the following fields:
Time and Configuration Fields: - DateRange Min - The minimum date in the snapshot time range - DateRange Max - The maximum date in the snapshot time range - Duration - The duration in seconds between the minimum and maximum snapshot times - Configuration - A comma-separated list of unique model configurations
Performance Metrics: - Positives Inbound - The sum of positive responses across all models in the inbound channels - Positives Outbound - The sum of positive responses across all models in the outbound channels - Responses Inbound - The sum of all responses across all models in the inbound channels - Responses Outbound - The sum of all responses across all models in the outbound channels - Performance - The weighted average performance across all models (50-100)
Action Statistics: - Actions - The total number of unique actions - Used Actions - The number of unique actions that have been used (have responses) - New Actions - The number of new actions introduced in the period - Issues - The number of unique issues - Groups - The number of unique issue/group combinations
Treatment Statistics: - Treatments - The total number of unique treatments - Used Treatments - The number of unique treatments that have been used
Channel Statistics: - Number of Valid Channels - The count of valid channels (channels with sufficient data) - Minimum Channel Performance - The performance of the channel with lowest performance - Channel with Minimum Performance - The channel/direction group with the lowest performance - OmniChannel - The average overlap of actions across channels (measure of Omni Channel capability)
Technology Usage Indicators: - usesNBAD - Boolean indicating whether standard NBAD configurations are used - usesAGB - Boolean indicating whether any Adaptive Gradient Boosting (AGB) models are used
Note: A channel is considered “valid” if it has at least 200 positives and 1000 responses
- Return type:
pl.LazyFrame