pdstools.explanations.Aggregate¶
Classes¶
Module Contents¶
- class Aggregate(explanations: pdstools.explanations.Explanations.Explanations)¶
Bases:
pdstools.utils.namespaces.LazyNamespace
- Parameters:
explanations (pdstools.explanations.Explanations.Explanations)
- dependencies = ['polars']¶
- dependency_group = 'explanations'¶
- explanations¶
- data_folderpath¶
- df_contextual = None¶
- df_overall = None¶
- context_operations¶
- initialized = False¶
- get_df_contextual() polars.LazyFrame ¶
Get the contextual dataframe, loading it if not already loaded.
- Return type:
polars.LazyFrame
- get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value)¶
Get the top-n predictor contributions for a given context or overall.
- Args:
- context (Optional[dict[str, str]]):
The context to filter contributions by. If None, contributions for all contexts will be returned.
- top_n (int):
Number of top predictors
- descending (bool):
Whether to sort contributions in descending order.
- missing (bool):
Whether to include contributions for missing predictor values.
- remaining (bool):
Whether to include contributions for remaining predictors outside the top-n.
- contribution_calculation (str):
Method to calculate contributions. Some options are contribution, contribution_abs, contribution_weighted. Default is contribution which is the average contributions to predictions.
- get_predictor_value_contributions(predictors: List[str], context: dict[str, str] | None = None, top_k: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value)¶
Get the top-k predictor value contributions for a given context or overall.
- Args:
- predictors (List[str]): Required.
List of predictors to get the contributions for.
- context (Optional[dict[str, str]]):
The context to filter contributions by. If None, contributions for all contexts will be returned.
- top_k (int):
Number of unique categorical predictor values to return.
- descending (bool):
Whether to sort contributions in descending order.
- missing (bool):
Whether to include contributions for missing predictor values.
- remaining (bool):
Whether to include contributions for remaining predictors outside the top-n.
- contribution_calculation (str):
Method to calculate contributions. Some options are contribution, contribution_abs, contribution_weighted. Default is contribution which is the average contributions to predictions.
- validate_folder()¶
Check if the aggregates folder exists. Raises:
FileNotFoundError: If the aggregates folder does not exist or is empty.
- get_unique_contexts_list(context_infos: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) List[pdstools.explanations.ExplanationsUtils.ContextInfo] ¶
- Parameters:
context_infos (Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]])
with_partition_col (bool)
- Return type:
List[pdstools.explanations.ExplanationsUtils.ContextInfo]
- _load_data()¶
- _get_predictor_contributions(contexts: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, predictors: List[str] | None = None, limit: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) polars.DataFrame ¶
- _get_predictor_value_contributions(contexts: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, predictors: List[str] | None = None, limit: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) polars.DataFrame ¶
- _get_df_with_sort_info(df: polars.LazyFrame, sort_by_column: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) polars.LazyFrame ¶
Add a sort column and value to the dataframe based on the predictor type. # Sort logic: # - numeric predictors are sorted by bin order # - symbolic predictors are sorted by contribution type
- Parameters:
df (polars.LazyFrame)
sort_by_column (str)
- Return type:
polars.LazyFrame
- _filter_for_predictors(df: polars.LazyFrame, predictors: List[str]) polars.LazyFrame ¶
- Parameters:
df (polars.LazyFrame)
predictors (List[str])
- Return type:
polars.LazyFrame
- _get_df_with_top_limit(df: polars.LazyFrame, over: List[str], contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value, limit: int = _DEFAULT.TOP_K.value, descending: bool = True) polars.LazyFrame ¶
- _get_missing_predictor_values_df(df: polars.LazyFrame) polars.LazyFrame ¶
- Parameters:
df (polars.LazyFrame)
- Return type:
polars.LazyFrame
- _get_df(contexts: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None)¶
- Parameters:
contexts (Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]])
- _get_base_df(df_filtered_contexts: polars.DataFrame | None = None) polars.LazyFrame ¶
- Parameters:
df_filtered_contexts (Optional[polars.DataFrame])
- Return type:
polars.LazyFrame