pdstools.explanations.Aggregate¶
Classes¶
Module Contents¶
- class Aggregate(explanations: pdstools.explanations.Explanations.Explanations)¶
Bases:
pdstools.utils.namespaces.LazyNamespace- Parameters:
explanations (pdstools.explanations.Explanations.Explanations)
- dependencies = ['polars']¶
- dependency_group = 'explanations'¶
- explanations¶
- data_folderpath¶
- data_pattern = None¶
- df_contextual = None¶
- df_overall = None¶
- context_operations¶
- initialized = False¶
- get_df_contextual() polars.LazyFrame¶
Get the contextual dataframe, loading it if not already loaded.
- Return type:
polars.LazyFrame
- get_df_overall() polars.LazyFrame¶
Get the overall dataframe, loading it if not already loaded.
- Return type:
polars.LazyFrame
- get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = defaults.top_n, descending: bool = defaults.descending, missing: bool = defaults.missing, remaining: bool = defaults.remaining, sort_by: str = defaults.sort_by.value)¶
Get the top-n predictor contributions for a given context or overall.
- Args:
- context (Optional[dict[str, str]]):
The context to filter contributions by. If None, contributions for all contexts will be returned.
- top_n (int):
Number of top predictors
- descending (bool):
Whether to sort contributions in descending order.
- missing (bool):
Whether to include contributions for missing predictor values.
- remaining (bool):
Whether to include contributions for remaining predictors outside the top-n.
- sort_by (str):
Method to sort/select top contributions. Options include contribution, contribution_abs, contribution_weighted. Default is contribution_abs which sorts by absolute average contributions.
- get_predictor_value_contributions(predictors: list[str], context: dict[str, str] | None = None, top_k: int = defaults.top_k, descending: bool = defaults.descending, missing: bool = defaults.missing, remaining: bool = defaults.remaining, sort_by: str = defaults.sort_by.value)¶
Get the top-k predictor value contributions for a given context or overall.
- Args:
- predictors (list[str]): Required.
list of predictors to get the contributions for.
- context (Optional[dict[str, str]]):
The context to filter contributions by. If None, contributions for all contexts will be returned.
- top_k (int):
Number of unique categorical predictor values to return.
- descending (bool):
Whether to sort contributions in descending order.
- missing (bool):
Whether to include contributions for missing predictor values.
- remaining (bool):
Whether to include contributions for remaining predictors outside the top-n.
- sort_by (str):
Method to sort/select top contributions. Options include contribution, contribution_abs, contribution_weighted. Default is contribution_abs which sorts by absolute average contributions.
- validate_folder()¶
Check if the aggregates folder exists.
- Raises:
FileNotFoundError: If the aggregates folder does not exist or is empty.
- get_unique_contexts_list(context_infos: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) list[pdstools.explanations.ExplanationsUtils.ContextInfo]¶
- Parameters:
context_infos (list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None)
with_partition_col (bool)
- Return type:
- _load_data()¶
- _get_predictor_contributions(contexts: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, predictors: list[str] | None = None, limit: int = defaults.top_n, descending: bool = defaults.descending, missing: bool = defaults.missing, remaining: bool = defaults.remaining, sort_by: str = defaults.sort_by.value) polars.DataFrame¶
- _get_predictor_value_contributions(contexts: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, predictors: list[str] | None = None, limit: int = defaults.top_k, descending: bool = defaults.descending, missing: bool = defaults.missing, remaining: bool = defaults.remaining, sort_by: str = defaults.sort_by.value) polars.DataFrame¶
- _get_df_with_sort_info(df: polars.LazyFrame, sort_by: str = defaults.sort_by.value) polars.LazyFrame¶
Add a sort column and value to the dataframe based on the predictor type. # Sort logic: # - numeric predictors are sorted by bin order # - symbolic predictors are sorted by contribution type
- Parameters:
df (polars.LazyFrame)
sort_by (str)
- Return type:
polars.LazyFrame
- _get_df_with_top_limit(df: polars.LazyFrame, over: list[str], sort_by: str = defaults.sort_by.value, limit: int = defaults.top_k, descending: bool = defaults.descending) polars.LazyFrame¶
Return the top limit rows per group, ranked by sort_by.
For each unique combination of values in over, keeps only the limit rows with the highest (or lowest) value in sort_by.
When descending=True (the default), the rows with the largest values are kept — i.e. the most impactful contributions rise to the top. When descending=False, the rows with the smallest values are kept instead, which is useful when selecting the least influential predictors.
Note: Polars’ top_k_by uses a reverse parameter whose semantics are the opposite of descending. reverse=False returns the k largest values, while reverse=True returns the k smallest. To keep the caller-facing API intuitive (descending=True → largest values), we pass reverse=not descending to Polars internally.
- _get_missing_predictor_values_df(df: polars.LazyFrame) polars.LazyFrame¶
- Parameters:
df (polars.LazyFrame)
- Return type:
polars.LazyFrame
- _get_df(contexts: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None)¶
- Parameters:
contexts (list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None)
- _get_base_df(df_filtered_contexts: polars.DataFrame | None = None) polars.LazyFrame¶
- Parameters:
df_filtered_contexts (polars.DataFrame | None)
- Return type:
polars.LazyFrame
- _calculate_remaining_aggregates(df_all: polars.LazyFrame, df_anti: polars.LazyFrame, anti_on: list[str], frequency_over: list[str], aggregate_over: list[str]) polars.LazyFrame¶
Anti-join to isolate non-top rows, aggregate, and label as ‘remaining’.
- static _label_remaining(df: polars.LazyFrame, aggregate_over: list[str]) polars.LazyFrame¶
Add ‘remaining’ labels based on aggregation granularity.
- _calculate_aggregates(df: polars.LazyFrame, frequency_over: list[str], aggregate_over: list[str]) polars.LazyFrame¶
Enrich with total_frequency at frequency_over level, then aggregate at aggregate_over level.
- static _add_total_frequency_to_df(df, group_by)¶
- add_frequency_pct_to_df(df, group_by)¶
Add a frequency percentage column to the dataframe based on the total frequency per group.
- static _get_mean_aggregates()¶
Get mean contribution aggregates.
- static _get_weighted_aggregates()¶
Get frequency-weighted contribution aggregates normalized by total frequency.
- static _get_frequency_aggregate()¶
Get frequency sum aggregate.
- static _get_bounds_aggregates()¶
Get min and max contribution bounds.
- _agg_over_columns_in_df(df, group_by)¶
Aggregate contribution metrics over specified columns.
- static _filter_single_bin_numeric_predictors(df: polars.LazyFrame) polars.LazyFrame¶
Remove numeric predictors that have only a single non-missing bin.
- Parameters:
df (polars.LazyFrame)
- Return type:
polars.LazyFrame