pdstools.explanations.Aggregate

Classes

Aggregate

Aggregate.

Module Contents

class Aggregate(explanations: pdstools.explanations.Explanations.Explanations)

Bases: pdstools.utils.namespaces.LazyNamespace

Aggregate.

Parameters:

explanations (pdstools.explanations.Explanations.Explanations)

dependencies: ClassVar[list[str]] = ['polars']
dependency_group = 'explanations'
explanations
data_folderpath
data_pattern = None
df_contextual: polars.LazyFrame | None = None
df_overall: polars.LazyFrame | None = None
context_operations
initialized = False
get_df_contextual() polars.LazyFrame

Get the contextual dataframe, loading it if not already loaded.

Return type:

polars.LazyFrame

get_df_overall() polars.LazyFrame

Get the overall dataframe, loading it if not already loaded.

Return type:

polars.LazyFrame

get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = 20, *, sort_by: pdstools.explanations.ExplanationsUtils.SortBy = 'contribution_abs', descending: bool = True, missing: bool = True, remaining: bool = True, include_numeric_single_bin: bool = False) polars.DataFrame

Get the top-n predictor contributions for a given context or overall.

Parameters:
  • context (dict[str, str] | None) – The context to filter contributions by. If None, contributions for all contexts will be returned.

  • top_n (int) – Number of top predictors.

  • sort_by (str) – Column to rank/select top predictors. One of contribution, contribution_abs, contribution_weighted, contribution_weighted_abs. Default: "contribution_abs".

  • descending (bool) – Sort most- or least-impactful first. Default: True.

  • missing (bool) – Include missing-value bins. Default: True.

  • remaining (bool) – Include an aggregated “remaining” row for predictors outside the top-n. Default: True.

  • include_numeric_single_bin (bool) – Include numeric predictors that have only a single bin. Default: False.

Return type:

polars.DataFrame

get_predictor_value_contributions(predictors: list[str], context: dict[str, str] | None = None, top_k: int = 20, *, sort_by: pdstools.explanations.ExplanationsUtils.SortBy = 'contribution_abs', descending: bool = True, missing: bool = True, remaining: bool = True, include_numeric_single_bin: bool = False) polars.DataFrame

Get the top-k predictor value contributions for a given context or overall.

Parameters:
  • predictors (list[str]) – Required. list of predictors to get the contributions for.

  • context (dict[str, str] | None) – The context to filter contributions by. If None, contributions for all contexts will be returned.

  • top_k (int) – Number of unique categorical predictor values to return.

  • sort_by (str) – Column to rank/select top predictors. One of contribution, contribution_abs, contribution_weighted, contribution_weighted_abs. Default: "contribution_abs".

  • descending (bool) – Sort most- or least-impactful first. Default: True.

  • missing (bool) – Include missing-value bins. Default: True.

  • remaining (bool) – Include an aggregated “remaining” row for values outside the top-k. Default: True.

  • include_numeric_single_bin (bool) – Include numeric predictors that have only a single bin. Default: False.

Return type:

polars.DataFrame

validate_folder()

Check if the aggregates folder exists.

Raises:

FileNotFoundError – If the aggregates folder does not exist or is empty.

get_unique_contexts_list(context_infos: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) list[pdstools.explanations.ExplanationsUtils.ContextInfo]

Get unique contexts list.

Parameters:
Return type:

list[pdstools.explanations.ExplanationsUtils.ContextInfo]

add_frequency_pct_to_df(df, group_by) polars.LazyFrame

Add a frequency percentage column to the dataframe based on the total frequency per group.

Return type:

polars.LazyFrame

add_context_frequency_pct_to_df(df: polars.DataFrame, join_on: list[str]) polars.DataFrame

Add frequency_pct showing this context’s share of the overall model.

For each row, computes frequency_pct = df.frequency / overall_model_frequency * 100 where the overall model frequency is summed over join_on columns from the overall (non-contextual) dataset.

Parameters:
  • df (pl.DataFrame) – DataFrame with a frequency column (context data).

  • join_on (list[str]) – Columns to join on, typically [predictor_name, predictor_type].

Returns:

df with an added frequency_pct column (0–100).

Return type:

pl.DataFrame