pdstools.explanations.Aggregate
===============================

.. py:module:: pdstools.explanations.Aggregate


Classes
-------

.. autoapisummary::

   pdstools.explanations.Aggregate.Aggregate


Module Contents
---------------

.. py:class:: Aggregate(explanations: pdstools.explanations.Explanations.Explanations)

   Bases: :py:obj:`pdstools.utils.namespaces.LazyNamespace`


   .. py:attribute:: dependencies
      :value: ['polars']


   .. py:attribute:: dependency_group
      :value: 'explanations'


   .. py:attribute:: explanations


   .. py:attribute:: data_folderpath


   .. py:attribute:: data_pattern
      :value: None


   .. py:attribute:: df_contextual
      :value: None


   .. py:attribute:: df_overall
      :value: None


   .. py:attribute:: context_operations


   .. py:attribute:: initialized
      :value: False


   .. py:method:: get_df_contextual() -> polars.LazyFrame

      Get the contextual dataframe, loading it if not already loaded.


   .. py:method:: get_df_overall() -> polars.LazyFrame

      Get the overall dataframe, loading it if not already loaded.


   .. py:method:: get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = defaults.top_n, **filter_kwargs)

      Get the top-n predictor contributions for a given context or overall.

      Args:
          context (Optional[dict[str, str]]):
              The context to filter contributions by.
              If None, contributions for all contexts will be returned.
          top_n (int):
              Number of top predictors.
          **filter_kwargs:
              Optional filtering and sorting controls. Valid keys:

              - ``sort_by`` (str): Column to rank/select top predictors.
                Options: ``contribution``, ``contribution_abs``,
                ``contribution_weighted``, ``contribution_weighted_abs``.
                Default: ``"contribution_abs"``.
              - ``descending`` (bool): Sort most- or least-impactful first.
                Default: ``True``.
              - ``missing`` (bool): Include missing-value bins. Default: ``True``.
              - ``remaining`` (bool): Include an aggregated "remaining" row for
                predictors outside the top-n. Default: ``True``.
              - ``include_numeric_single_bin`` (bool): Include numeric predictors
                that have only a single bin. Default: ``False``.


   .. py:method:: get_predictor_value_contributions(predictors: list[str], context: dict[str, str] | None = None, top_k: int = defaults.top_k, **filter_kwargs)

      Get the top-k predictor value contributions for a given context or overall.

      Args:
          predictors (list[str]): Required.
              list of predictors to get the contributions for.
          context (Optional[dict[str, str]]):
              The context to filter contributions by.
              If None, contributions for all contexts will be returned.
          top_k (int):
              Number of unique categorical predictor values to return.
          **filter_kwargs:
              Optional filtering and sorting controls. Valid keys:

              - ``sort_by`` (str): Column to rank/select top predictors.
                Options: ``contribution``, ``contribution_abs``,
                ``contribution_weighted``, ``contribution_weighted_abs``.
                Default: ``"contribution_abs"``.
              - ``descending`` (bool): Sort most- or least-impactful first.
                Default: ``True``.
              - ``missing`` (bool): Include missing-value bins. Default: ``True``.
              - ``remaining`` (bool): Include an aggregated "remaining" row for
                values outside the top-k. Default: ``True``.
              - ``include_numeric_single_bin`` (bool): Include numeric predictors
                that have only a single bin. Default: ``False``.


   .. py:method:: validate_folder()

      Check if the aggregates folder exists.

      Raises:
          FileNotFoundError: If the aggregates folder does not exist or is empty.


   .. py:method:: get_unique_contexts_list(context_infos: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) -> list[pdstools.explanations.ExplanationsUtils.ContextInfo]


   .. py:method:: add_frequency_pct_to_df(df, group_by)

      Add a frequency percentage column to the dataframe based on the total frequency per group.