pdstools.adm.Aggregates¶
Classes¶
Module Contents¶
- class Aggregates(datamart: pdstools.adm.ADMDatamart.ADMDatamart)¶
- Parameters:
datamart (pdstools.adm.ADMDatamart.ADMDatamart)
- datamart¶
- cdh_guidelines¶
- last(*, data: polars.LazyFrame | None = None, table: Literal['model_data', 'predictor_data', 'combined_data'] = 'model_data')¶
Gets the last snapshot of the given table
- Parameters:
data (Optional[pl.LazyFrame], optional) – If provided, subsets to just that dataframe, by default None
table (Literal['model_data', 'predictor_data', 'combined_data'], optional) – If provided, specifies the table to get data from, by default “model_data”
- Returns:
_description_
- Return type:
_type_
- _combine_data(model_df: polars.LazyFrame | None, predictor_df: polars.LazyFrame | None) polars.LazyFrame | None ¶
Combines the model and predictor tables to the combined_data attribute
- Parameters:
model_df (pl.LazyFrame) – The model snapshots table
predictor_df (pl.LazyFrame) – The predictor binning snapshots table
- Returns:
The resulting data, joined on the ModelID column
- Return type:
pl.LazyFrame
- predictor_performance_pivot(*, query: pdstools.utils.types.QUERY | None = None, active_only: bool = False, by='Name', top_predictors: int | None = None, top_groups: int | None = None) polars.LazyFrame ¶
Creates a pivot table of the predictor performance per ‘group’
- Parameters:
query (Optional[QUERY], optional) – A query to apply to the data before creating the pivot, by default None
by (str, optional) – A group by which to ‘facet’, by default “Name”. If, for instance, the ‘by’ argument is set to ‘Configuration’, each row will be a distinct configuration
top_predictors (Optional[int], optional) – Specify the maximum number of predictors, by default None
top_groups (Optional[int], optional) – Specify the maximum number of ‘groups’ specified in the ‘by’ argument, by default None
active_only (bool)
- Returns:
A LazyFrame with a column for each predictor, and a row for each ‘group’. The values represent the weighted performance for that predictor
- Return type:
pl.LazyFrame
- model_summary(by: str = 'Name', query: pdstools.utils.types.QUERY | None = None) polars.LazyFrame ¶
Generate a summary of statistic for each model (based on model ID)
If you want to generate statistics at a model name or treatment level, specify this in the ‘by’ column.
- Parameters:
by (str, optional) – The column to define the ‘counts’ for, by default “ModelID” Must be part of the context keys in the ADMDatamart class
query (Optional[QUERY], optional) – A query to apply to the data before summarization, by default None
- Returns:
A LazyFrame, with one row for each context key combination
- Return type:
pl.LazyFrame
- predictor_counts(*, facet: str = 'Configuration', by: str = 'Type', query: pdstools.utils.types.QUERY | None = None) polars.LazyFrame ¶
Returns the count of each predictor grouped by a certain column
- static _top_n(df: polars.DataFrame, top_n: int, metric: str = 'PredictorPerformance', facets: list | None = None)¶
Subsets DataFrame to contain only top_n predictors.
- _adm_model_summary(by_period: str | None, by_channel: bool, debug: bool, custom_channels: Dict[str, str] | None = None) polars.LazyFrame ¶
- _summarize_meta_info(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) polars.LazyFrame ¶
- _summarize_model_analytics(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) polars.LazyFrame ¶
- _summarize_action_analytics(grouping: List[str] | None, model_data: polars.LazyFrame, debug: bool) polars.LazyFrame ¶
- _summarize_model_usage(grouping: List[str] | None, model_data: polars.LazyFrame, standard_configurations: List[str], debug: bool) polars.LazyFrame ¶
- summary_by_channel(by_period: str | None = None, custom_channels: Dict[str, str] | None = None, debug: bool = False) polars.LazyFrame ¶
Summarize ADM models per channel
- Parameters:
by_period (str, optional) – Optional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example “1mo”, “1w”, “1d” for calendar month, week day. If provided, creates a new Period column with the truncated date/time. Defaults to None. NOTE: this argument is going to be deprecated in favor of explicit start/end dates in the near future.
custom_channels (Dict[str, str], optional) – Optional dictionary mapping custom channel names to standard channel groups. Defaults to None.
debug (bool, optional) – If True, enables debug mode for additional logging or outputs. Defaults to False.
- Returns:
Dataframe with summary per channel (and optionally a period) with the following fields:
Channel Identification: - Channel - The channel name - Direction - The direction (e.g., Inbound, Outbound) - ChannelDirection - Combined Channel/Direction (e.g., “Web/Inbound”) - ChannelDirectionGroup - Standardized channel group with direction (e.g., “Web/Inbound”) - Period - (Only present when by_period parameter is specified) The time period for the data
Time and Configuration Fields: - DateRange Min - The minimum date in the snapshot time range - DateRange Max - The maximum date in the snapshot time range - Duration - The duration in seconds between the minimum and maximum snapshot times - Configuration - A comma-separated list of model configuration names
Performance Metrics: - Positives - The sum of positive responses across all models in the channel - Responses - The sum of all responses across all models in the channel - Performance - The weighted average performance across all models in the channel (50-100) - CTR - Click-through rate (Positives / Responses) in the channel - isValid - Boolean indicating if the channel has sufficient data (at least 200 positives and 1000 responses)
Action Statistics: - Actions - The total number of unique actions in the channel - Used Actions - The number of unique actions that have been used (have responses) - New Actions - The number of new actions introduced in the period - Issues - The number of unique issues - Groups - The number of unique issue/group combinations
Treatment Statistics: - Treatments - The total number of unique treatments - Used Treatments - The number of unique treatments
Omnichannel Metrics: - OmniChannel - The overlap of actions with other channels (measure of Omni Channel capability)
Technology Usage Indicators: - usesNBAD - Boolean indicating whether any standard NBAD configurations are used - usesAGB - Boolean indicating whether any Adaptive Generic Boosting (AGB) models are used
- Return type:
pl.LazyFrame
- summary_by_configuration() polars.DataFrame ¶
Generates a summary of the ADM model configurations.
- Returns:
A Polars DataFrame containing the configuration summary.
- Return type:
pl.DataFrame
- predictors_overview(model_id: str | None = None, additional_aggregations: list | None = None) polars.DataFrame | None ¶
Generate a summary of the last snapshot of predictor data. :returns: A Polars DataFrame containing the predictor summary if successful,
None if the required data is not available or encountered to an error.
- overall_summary(by_period: str = None, debug: bool = False) polars.LazyFrame ¶
Overall ADM models summary. Only valid data is included.
- Parameters:
custom_channels (Dict[str, str], optional) – Optional list with custom channel/direction name mappings. Defaults to None.
by_period (str, optional) – Optional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example “1mo”, “1w”, “1d” for calendar month, week day. If provided, creates a new Period column with the truncated date/time. Defaults to None. NOTE: this argument is going to be deprecated in favor of explicit start/end dates in the near future.
debug (bool, optional) – If True, enables debug mode for additional logging or outputs. Defaults to False.
- Returns:
Summary across all valid ADM models as a dataframe with the following fields:
Time and Configuration Fields: - Period - (Only present when by_period parameter is specified) The time period for the data - DateRange Min - The minimum date in the snapshot time range - DateRange Max - The maximum date in the snapshot time range - Duration - The duration in seconds between the minimum and maximum snapshot times - Configuration - A comma-separated list of unique configurations
Performance Metrics: - Positives Inbound - The sum of positive responses across all models in the inbound channels - Positives Outbound - The sum of positive responses across all models in the outbound channels - Responses Inbound - The sum of all responses across all models in the inbound channels - Responses Outbound - The sum of all responses across all models in the outbound channels - Performance - The weighted average performance across all models (50-100)
Action Statistics: - Actions - The total number of unique actions - Used Actions - The number of unique actions that have been used (have responses) - New Actions - The number of new actions introduced in the period - Issues - The number of unique issues - Groups - The number of unique issue/group combinations
Treatment Statistics: - Treatments - The total number of unique treatments - Used Treatments - The number of unique treatments that have been used
Channel Statistics: - Number of Valid Channels - The count of valid channels (channels with sufficient data) - Minimum Channel Performance - The performance of the channel with lowest performance - Channel with Minimum Performance - The channel/direction group with the lowest performance - OmniChannel - The average overlap of actions across channels (measure of Omni Channel capability)
Technology Usage Indicators: - usesNBAD - Boolean indicating whether standard NBAD configurations are used - usesAGB - Boolean indicating whether any Adaptive Gradient Boosting (AGB) models are used
Note: A channel is considered “valid” if it has at least 200 positives and 1000 responses
- Return type:
pl.LazyFrame