pdstools.explanations.Aggregate
===============================

.. py:module:: pdstools.explanations.Aggregate


Classes
-------

.. autoapisummary::

   pdstools.explanations.Aggregate.Aggregate


Module Contents
---------------

.. py:class:: Aggregate(explanations: pdstools.explanations.Explanations.Explanations)

   Bases: :py:obj:`pdstools.utils.namespaces.LazyNamespace`


   .. py:attribute:: dependencies
      :value: ['polars']


   .. py:attribute:: dependency_group
      :value: 'explanations'


   .. py:attribute:: explanations


   .. py:attribute:: data_folderpath


   .. py:attribute:: df_contextual
      :value: None


   .. py:attribute:: df_overall
      :value: None


   .. py:attribute:: context_operations


   .. py:attribute:: initialized
      :value: False


   .. py:method:: get_df_contextual() -> polars.LazyFrame

      Get the contextual dataframe, loading it if not already loaded.


   .. py:method:: get_predictor_contributions(context: Optional[dict[str, str]] = None, top_n: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value)

      Get the top-n predictor contributions for a given context or overall.

      Args:
          context (Optional[dict[str, str]]):
              The context to filter contributions by.
              If None, contributions for all contexts will be returned.
          top_n (int):
              Number of top predictors
          descending (bool):
              Whether to sort contributions in descending order.
          missing (bool):
              Whether to include contributions for missing predictor values.
          remaining (bool):
              Whether to include contributions for remaining predictors outside the top-n.
          contribution_calculation (str):
              Method to calculate contributions. Some options are
              `contribution`, `contribution_abs`, `contribution_weighted`.
              Default is `contribution` which is the average contributions to predictions.


   .. py:method:: get_predictor_value_contributions(predictors: List[str], context: Optional[dict[str, str]] = None, top_k: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_calculation: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value)

      Get the top-k predictor value contributions for a given context or overall.

      Args:
          predictors (List[str]): Required.
              List of predictors to get the contributions for.
          context (Optional[dict[str, str]]):
              The context to filter contributions by.
              If None, contributions for all contexts will be returned.
          top_k (int):
              Number of unique categorical predictor values to return.
          descending (bool):
              Whether to sort contributions in descending order.
          missing (bool):
              Whether to include contributions for missing predictor values.
          remaining (bool):
              Whether to include contributions for remaining predictors outside the top-n.
          contribution_calculation (str):
              Method to calculate contributions. Some options are
              `contribution`, `contribution_abs`, `contribution_weighted`.
              Default is `contribution` which is the average contributions to predictions.


   .. py:method:: validate_folder()

      Check if the aggregates folder exists.
      Raises:
          FileNotFoundError: If the aggregates folder does not exist or is empty.


   .. py:method:: get_unique_contexts_list(context_infos: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None, with_partition_col: bool = False) -> List[pdstools.explanations.ExplanationsUtils.ContextInfo]


   .. py:method:: _load_data()


   .. py:method:: _get_predictor_contributions(contexts: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None, predictors: Optional[List[str]] = None, limit: int = _DEFAULT.TOP_N.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) -> polars.DataFrame


   .. py:method:: _get_predictor_value_contributions(contexts: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None, predictors: Optional[List[str]] = None, limit: int = _DEFAULT.TOP_K.value, descending: bool = _DEFAULT.DESCENDING.value, missing: bool = _DEFAULT.MISSING.value, remaining: bool = _DEFAULT.REMAINING.value, contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) -> polars.DataFrame


   .. py:method:: _get_df_with_sort_info(df: polars.LazyFrame, sort_by_column: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value) -> polars.LazyFrame

      Add a sort column and value to the dataframe based on the predictor type.
      # Sort logic:
      #  - numeric predictors are sorted by bin order
      #  - symbolic predictors are sorted by contribution type


   .. py:method:: _filter_for_predictors(df: polars.LazyFrame, predictors: List[str]) -> polars.LazyFrame


   .. py:method:: _get_df_with_top_limit(df: polars.LazyFrame, over: List[str], contribution_type: str = _CONTRIBUTION_TYPE.CONTRIBUTION.value, limit: int = _DEFAULT.TOP_K.value, descending: bool = True) -> polars.LazyFrame


   .. py:method:: _get_missing_predictor_values_df(df: polars.LazyFrame) -> polars.LazyFrame


   .. py:method:: _get_df(contexts: Optional[List[pdstools.explanations.ExplanationsUtils.ContextInfo]] = None)


   .. py:method:: _get_base_df(df_filtered_contexts: Optional[polars.DataFrame] = None) -> polars.LazyFrame


   .. py:method:: _get_group_by_columns(predictors: Optional[List[str]] = None) -> List[str]


   .. py:method:: _get_sort_over_columns(predictors: Optional[List[str]] = None) -> List[str]


   .. py:method:: _calculate_remaining_aggregates(df_all: polars.LazyFrame, df_anti: polars.LazyFrame, aggregate_over: List[str], anti_on: List[str]) -> polars.LazyFrame


   .. py:method:: _calculate_aggregates(df: polars.LazyFrame, aggregate_frequency_over: List[str], aggregate_over: List[str]) -> polars.LazyFrame