pdstools.explanations.Aggregate¶

Classes¶

Aggregate

Module Contents¶

class Aggregate(explanations: pdstools.explanations.Explanations.Explanations)¶

Bases: pdstools.utils.namespaces.LazyNamespace

Parameters:: explanations (pdstools.explanations.Explanations.Explanations)

dependencies = ['polars']¶

dependency_group = 'explanations'¶

explanations¶

data_folderpath¶

data_pattern = None¶

df_contextual = None¶

df_overall = None¶

context_operations¶

initialized = False¶

get_df_contextual() → polars.LazyFrame¶

Get the contextual dataframe, loading it if not already loaded.

Return type:: polars.LazyFrame

get_df_overall() → polars.LazyFrame¶

Get the overall dataframe, loading it if not already loaded.

Return type:: polars.LazyFrame

get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = defaults.top_n, **filter_kwargs)¶

Get the top-n predictor contributions for a given context or overall.

Args:

context (Optional[dict[str, str]]):

The context to filter contributions by. If None, contributions for all contexts will be returned.

top_n (int):

Number of top predictors.

**filter_kwargs:

Optional filtering and sorting controls. Valid keys:

sort_by (str): Column to rank/select top predictors. Options: contribution, contribution_abs, contribution_weighted, contribution_weighted_abs. Default: "contribution_abs".
descending (bool): Sort most- or least-impactful first. Default: True.
missing (bool): Include missing-value bins. Default: True.
remaining (bool): Include an aggregated “remaining” row for predictors outside the top-n. Default: True.
include_numeric_single_bin (bool): Include numeric predictors that have only a single bin. Default: False.

Parameters:

context (dict[str, str] | None)
top_n (int)

get_predictor_value_contributions(predictors: list[str], context: dict[str, str] | None = None, top_k: int = defaults.top_k, **filter_kwargs)¶

Get the top-k predictor value contributions for a given context or overall.

Args:

predictors (list[str]): Required.

list of predictors to get the contributions for.

context (Optional[dict[str, str]]):

The context to filter contributions by. If None, contributions for all contexts will be returned.

top_k (int):

Number of unique categorical predictor values to return.

**filter_kwargs:

Optional filtering and sorting controls. Valid keys:

sort_by (str): Column to rank/select top predictors. Options: contribution, contribution_abs, contribution_weighted, contribution_weighted_abs. Default: "contribution_abs".
descending (bool): Sort most- or least-impactful first. Default: True.
missing (bool): Include missing-value bins. Default: True.
remaining (bool): Include an aggregated “remaining” row for values outside the top-k. Default: True.
include_numeric_single_bin (bool): Include numeric predictors that have only a single bin. Default: False.

Parameters:

predictors (list[str])
context (dict[str, str] | None)
top_k (int)

validate_folder()¶

Check if the aggregates folder exists.

Raises:: FileNotFoundError: If the aggregates folder does not exist or is empty.

get_unique_contexts_list(context_infos: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) → list[pdstools.explanations.ExplanationsUtils.ContextInfo]¶

Parameters:

context_infos (list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None)
with_partition_col (bool)

Return type:

list[pdstools.explanations.ExplanationsUtils.ContextInfo]

add_frequency_pct_to_df(df, group_by)¶: Add a frequency percentage column to the dataframe based on the total frequency per group.