pdstools.explanations.Aggregate¶

Classes¶

Aggregate

Module Contents¶

class Aggregate(explanations: pdstools.explanations.Explanations.Explanations)¶

Bases: pdstools.utils.namespaces.LazyNamespace

Parameters:: explanations (pdstools.explanations.Explanations.Explanations)

dependencies = ['polars']¶

dependency_group = 'explanations'¶

explanations¶

data_folderpath¶

data_pattern = None¶

df_contextual = None¶

df_overall = None¶

context_operations¶

initialized = False¶

get_df_contextual() → polars.LazyFrame¶

Get the contextual dataframe, loading it if not already loaded.

Return type:: polars.LazyFrame

get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value)¶

Get the top-n predictor contributions for a given context or overall.

Args:

context (Optional[dict[str, str]]):: The context to filter contributions by. If None, contributions for all contexts will be returned.
top_n (int):: Number of top predictors
descending (bool):: Whether to sort contributions in descending order.
missing (bool):: Whether to include contributions for missing predictor values.
remaining (bool):: Whether to include contributions for remaining predictors outside the top-n.
contribution_calculation (str):: Method to calculate contributions. Some options are contribution, contribution_abs, contribution_weighted. Default is contribution which is the average contributions to predictions.

Parameters:

context (Optional[dict[str, str]])
top_n (int)
descending (bool)
missing (bool)
remaining (bool)
contribution_calculation (str)

get_predictor_value_contributions(predictors: List[str], context: dict[str, str] | None = None, top_k: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value)¶

Get the top-k predictor value contributions for a given context or overall.

Args:

predictors (List[str]): Required.: List of predictors to get the contributions for.
context (Optional[dict[str, str]]):: The context to filter contributions by. If None, contributions for all contexts will be returned.
top_k (int):: Number of unique categorical predictor values to return.
descending (bool):: Whether to sort contributions in descending order.
missing (bool):: Whether to include contributions for missing predictor values.
remaining (bool):: Whether to include contributions for remaining predictors outside the top-n.
contribution_calculation (str):: Method to calculate contributions. Some options are contribution, contribution_abs, contribution_weighted. Default is contribution which is the average contributions to predictions.

Parameters:

predictors (List[str])
context (Optional[dict[str, str]])
top_k (int)
descending (bool)
missing (bool)
remaining (bool)
contribution_calculation (str)

validate_folder()¶: Check if the aggregates folder exists. Raises:

FileNotFoundError: If the aggregates folder does not exist or is empty.

get_unique_contexts_list(context_infos: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) → List[pdstools.explanations.ExplanationsUtils.ContextInfo]¶

Parameters:

context_infos (Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]])
with_partition_col (bool)

Return type:

List[pdstools.explanations.ExplanationsUtils.ContextInfo]

_load_data()¶

_get_predictor_contributions(contexts: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, predictors: List[str] | None = None, limit: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) → polars.DataFrame¶

Parameters:

contexts (Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]])
predictors (Optional[List[str]])
limit (int)
descending (bool)
missing (bool)
remaining (bool)
contribution_type (str)

Return type:

polars.DataFrame

_get_predictor_value_contributions(contexts: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, predictors: List[str] | None = None, limit: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) → polars.DataFrame¶

Parameters:

contexts (Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]])
predictors (Optional[List[str]])
limit (int)
descending (bool)
missing (bool)
remaining (bool)
contribution_type (str)

Return type:

polars.DataFrame

_get_df_with_sort_info(df: polars.LazyFrame, sort_by_column: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) → polars.LazyFrame¶

Add a sort column and value to the dataframe based on the predictor type. # Sort logic: # - numeric predictors are sorted by bin order # - symbolic predictors are sorted by contribution type

Parameters:

df (polars.LazyFrame)
sort_by_column (str)

Return type:

polars.LazyFrame

_filter_for_predictors(df: polars.LazyFrame, predictors: List[str]) → polars.LazyFrame¶

Parameters:

df (polars.LazyFrame)
predictors (List[str])

Return type:

polars.LazyFrame

_get_df_with_top_limit(df: polars.LazyFrame, over: List[str], contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value, limit: int = _DEFAULT.TOP_K.value, descending: bool = True) → polars.LazyFrame¶

Parameters:

df (polars.LazyFrame)
over (List[str])
contribution_type (str)
limit (int)
descending (bool)

Return type:

polars.LazyFrame

_get_missing_predictor_values_df(df: polars.LazyFrame) → polars.LazyFrame¶

Parameters:: df (polars.LazyFrame)
Return type:: polars.LazyFrame

_get_df(contexts: List[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None)¶

Parameters:: contexts (Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]])

_get_base_df(df_filtered_contexts: polars.DataFrame | None = None) → polars.LazyFrame¶

Parameters:: df_filtered_contexts (Optional[polars.DataFrame])
Return type:: polars.LazyFrame

_get_group_by_columns(predictors: List[str] | None = None) → List[str]¶

Parameters:: predictors (Optional[List[str]])
Return type:: List[str]

_get_sort_over_columns(predictors: List[str] | None = None) → List[str]¶

Parameters:: predictors (Optional[List[str]])
Return type:: List[str]

_calculate_remaining_aggregates(df_all: polars.LazyFrame, df_anti: polars.LazyFrame, aggregate_over: List[str], anti_on: List[str]) → polars.LazyFrame¶

Parameters:

df_all (polars.LazyFrame)
df_anti (polars.LazyFrame)
aggregate_over (List[str])
anti_on (List[str])

Return type:

polars.LazyFrame

_calculate_aggregates(df: polars.LazyFrame, aggregate_frequency_over: List[str], aggregate_over: List[str]) → polars.LazyFrame¶

Parameters:

df (polars.LazyFrame)
aggregate_frequency_over (List[str])
aggregate_over (List[str])

Return type:

polars.LazyFrame