pdstools.explanations.Aggregate¶
Classes¶
Aggregate. |
Module Contents¶
- class Aggregate(explanations: pdstools.explanations.Explanations.Explanations)¶
Bases:
pdstools.utils.namespaces.LazyNamespaceAggregate.
- Parameters:
explanations (pdstools.explanations.Explanations.Explanations)
- dependency_group = 'explanations'¶
- explanations¶
- data_folderpath¶
- data_pattern = None¶
- context_operations¶
- initialized = False¶
- get_df_contextual() polars.LazyFrame¶
Get the contextual dataframe, loading it if not already loaded.
- Return type:
polars.LazyFrame
- get_df_overall() polars.LazyFrame¶
Get the overall dataframe, loading it if not already loaded.
- Return type:
polars.LazyFrame
- get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = 20, *, sort_by: pdstools.explanations.ExplanationsUtils.SortBy = 'contribution_abs', descending: bool = True, missing: bool = True, remaining: bool = True, include_numeric_single_bin: bool = False) polars.DataFrame¶
Get the top-n predictor contributions for a given context or overall.
- Parameters:
context (dict[str, str] | None) – The context to filter contributions by. If None, contributions for all contexts will be returned.
top_n (int) – Number of top predictors.
sort_by (str) – Column to rank/select top predictors. One of
contribution,contribution_abs,contribution_weighted,contribution_weighted_abs. Default:"contribution_abs".descending (bool) – Sort most- or least-impactful first. Default:
True.missing (bool) – Include missing-value bins. Default:
True.remaining (bool) – Include an aggregated “remaining” row for predictors outside the top-n. Default:
True.include_numeric_single_bin (bool) – Include numeric predictors that have only a single bin. Default:
False.
- Return type:
polars.DataFrame
- get_predictor_value_contributions(predictors: list[str], context: dict[str, str] | None = None, top_k: int = 20, *, sort_by: pdstools.explanations.ExplanationsUtils.SortBy = 'contribution_abs', descending: bool = True, missing: bool = True, remaining: bool = True, include_numeric_single_bin: bool = False) polars.DataFrame¶
Get the top-k predictor value contributions for a given context or overall.
- Parameters:
predictors (list[str]) – Required. list of predictors to get the contributions for.
context (dict[str, str] | None) – The context to filter contributions by. If None, contributions for all contexts will be returned.
top_k (int) – Number of unique categorical predictor values to return.
sort_by (str) – Column to rank/select top predictors. One of
contribution,contribution_abs,contribution_weighted,contribution_weighted_abs. Default:"contribution_abs".descending (bool) – Sort most- or least-impactful first. Default:
True.missing (bool) – Include missing-value bins. Default:
True.remaining (bool) – Include an aggregated “remaining” row for values outside the top-k. Default:
True.include_numeric_single_bin (bool) – Include numeric predictors that have only a single bin. Default:
False.
- Return type:
polars.DataFrame
- validate_folder()¶
Check if the aggregates folder exists.
- Raises:
FileNotFoundError – If the aggregates folder does not exist or is empty.
- get_unique_contexts_list(context_infos: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) list[pdstools.explanations.ExplanationsUtils.ContextInfo]¶
Get unique contexts list.
- Parameters:
context_infos (list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None)
with_partition_col (bool)
- Return type:
- add_frequency_pct_to_df(df, group_by) polars.LazyFrame¶
Add a frequency percentage column to the dataframe based on the total frequency per group.
- Return type:
polars.LazyFrame
- add_context_frequency_pct_to_df(df: polars.DataFrame, join_on: list[str]) polars.DataFrame¶
Add frequency_pct showing this context’s share of the overall model.
For each row, computes
frequency_pct = df.frequency / overall_model_frequency * 100where the overall model frequency is summed overjoin_oncolumns from the overall (non-contextual) dataset.