pdstools.explanations.Aggregate =============================== .. py:module:: pdstools.explanations.Aggregate Classes ------- .. autoapisummary:: pdstools.explanations.Aggregate.Aggregate Module Contents --------------- .. py:class:: Aggregate(explanations: pdstools.explanations.Explanations.Explanations) Bases: :py:obj:`pdstools.utils.namespaces.LazyNamespace` .. py:attribute:: dependencies :value: ['polars'] .. py:attribute:: dependency_group :value: 'explanations' .. py:attribute:: explanations .. py:attribute:: data_folderpath .. py:attribute:: df_contextual :value: None .. py:attribute:: df_overall :value: None .. py:attribute:: context_operations .. py:attribute:: initialized :value: False .. py:method:: get_df_contextual() -> polars.LazyFrame Get the contextual dataframe, loading it if not already loaded. .. py:method:: get_predictor_contributions(context: Optional[dict[str, str]] = None, top_n: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) Get the top-n predictor contributions for a given context or overall. Args: context (Optional[dict[str, str]]): The context to filter contributions by. If None, contributions for all contexts will be returned. top_n (int): Number of top predictors descending (bool): Whether to sort contributions in descending order. missing (bool): Whether to include contributions for missing predictor values. remaining (bool): Whether to include contributions for remaining predictors outside the top-n. contribution_calculation (str): Method to calculate contributions. Some options are `contribution`, `contribution_abs`, `contribution_weighted`. Default is `contribution` which is the average contributions to predictions. .. py:method:: get_predictor_value_contributions(predictors: List[str], context: Optional[dict[str, str]] = None, top_k: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) Get the top-k predictor value contributions for a given context or overall. Args: predictors (List[str]): Required. List of predictors to get the contributions for. context (Optional[dict[str, str]]): The context to filter contributions by. If None, contributions for all contexts will be returned. top_k (int): Number of unique categorical predictor values to return. descending (bool): Whether to sort contributions in descending order. missing (bool): Whether to include contributions for missing predictor values. remaining (bool): Whether to include contributions for remaining predictors outside the top-n. contribution_calculation (str): Method to calculate contributions. Some options are `contribution`, `contribution_abs`, `contribution_weighted`. Default is `contribution` which is the average contributions to predictions. .. py:method:: validate_folder() Check if the aggregates folder exists. Raises: FileNotFoundError: If the aggregates folder does not exist or is empty. .. py:method:: get_unique_contexts_list(context_infos: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None, with_partition_col: bool = False) -> List[pdstools.explanations.ExplanationsUtils.ContextInfo] .. py:method:: _load_data() .. py:method:: _get_predictor_contributions(contexts: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None, predictors: Optional[List[str]] = None, limit: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) -> polars.DataFrame .. py:method:: _get_predictor_value_contributions(contexts: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None, predictors: Optional[List[str]] = None, limit: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) -> polars.DataFrame .. py:method:: _get_df_with_sort_info(df: polars.LazyFrame, sort_by_column: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) -> polars.LazyFrame Add a sort column and value to the dataframe based on the predictor type. # Sort logic: # - numeric predictors are sorted by bin order # - symbolic predictors are sorted by contribution type .. py:method:: _filter_for_predictors(df: polars.LazyFrame, predictors: List[str]) -> polars.LazyFrame .. py:method:: _get_df_with_top_limit(df: polars.LazyFrame, over: List[str], contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value, limit: int = _DEFAULT.TOP_K.value, descending: bool = True) -> polars.LazyFrame .. py:method:: _get_missing_predictor_values_df(df: polars.LazyFrame) -> polars.LazyFrame .. py:method:: _get_df(contexts: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None) .. py:method:: _get_base_df(df_filtered_contexts: Optional[polars.DataFrame] = None) -> polars.LazyFrame .. py:method:: _get_group_by_columns(predictors: Optional[List[str]] = None) -> List[str] .. py:method:: _get_sort_over_columns(predictors: Optional[List[str]] = None) -> List[str] .. py:method:: _calculate_remaining_aggregates(df_all: polars.LazyFrame, df_anti: polars.LazyFrame, aggregate_over: List[str], anti_on: List[str]) -> polars.LazyFrame .. py:method:: _calculate_aggregates(df: polars.LazyFrame, aggregate_frequency_over: List[str], aggregate_over: List[str]) -> polars.LazyFrame