pdstools.explanations.Aggregate =============================== .. py:module:: pdstools.explanations.Aggregate Classes ------- .. autoapisummary:: pdstools.explanations.Aggregate.Aggregate Module Contents --------------- .. py:class:: Aggregate(explanations: pdstools.explanations.Explanations.Explanations) Bases: :py:obj:`pdstools.utils.namespaces.LazyNamespace` .. py:attribute:: dependencies :value: ['polars'] .. py:attribute:: dependency_group :value: 'explanations' .. py:attribute:: explanations .. py:attribute:: data_folderpath .. py:attribute:: data_pattern :value: None .. py:attribute:: df_contextual :value: None .. py:attribute:: df_overall :value: None .. py:attribute:: context_operations .. py:attribute:: initialized :value: False .. py:method:: get_df_contextual() -> polars.LazyFrame Get the contextual dataframe, loading it if not already loaded. .. py:method:: get_df_overall() -> polars.LazyFrame Get the overall dataframe, loading it if not already loaded. .. py:method:: get_predictor_contributions(context: dict[str, str] | None = None, top_n: int = defaults.top_n, **filter_kwargs) Get the top-n predictor contributions for a given context or overall. Args: context (Optional[dict[str, str]]): The context to filter contributions by. If None, contributions for all contexts will be returned. top_n (int): Number of top predictors. **filter_kwargs: Optional filtering and sorting controls. Valid keys: - ``sort_by`` (str): Column to rank/select top predictors. Options: ``contribution``, ``contribution_abs``, ``contribution_weighted``, ``contribution_weighted_abs``. Default: ``"contribution_abs"``. - ``descending`` (bool): Sort most- or least-impactful first. Default: ``True``. - ``missing`` (bool): Include missing-value bins. Default: ``True``. - ``remaining`` (bool): Include an aggregated "remaining" row for predictors outside the top-n. Default: ``True``. - ``include_numeric_single_bin`` (bool): Include numeric predictors that have only a single bin. Default: ``False``. .. py:method:: get_predictor_value_contributions(predictors: list[str], context: dict[str, str] | None = None, top_k: int = defaults.top_k, **filter_kwargs) Get the top-k predictor value contributions for a given context or overall. Args: predictors (list[str]): Required. list of predictors to get the contributions for. context (Optional[dict[str, str]]): The context to filter contributions by. If None, contributions for all contexts will be returned. top_k (int): Number of unique categorical predictor values to return. **filter_kwargs: Optional filtering and sorting controls. Valid keys: - ``sort_by`` (str): Column to rank/select top predictors. Options: ``contribution``, ``contribution_abs``, ``contribution_weighted``, ``contribution_weighted_abs``. Default: ``"contribution_abs"``. - ``descending`` (bool): Sort most- or least-impactful first. Default: ``True``. - ``missing`` (bool): Include missing-value bins. Default: ``True``. - ``remaining`` (bool): Include an aggregated "remaining" row for values outside the top-k. Default: ``True``. - ``include_numeric_single_bin`` (bool): Include numeric predictors that have only a single bin. Default: ``False``. .. py:method:: validate_folder() Check if the aggregates folder exists. Raises: FileNotFoundError: If the aggregates folder does not exist or is empty. .. py:method:: get_unique_contexts_list(context_infos: list[pdstools.explanations.ExplanationsUtils.ContextInfo] | None = None, with_partition_col: bool = False) -> list[pdstools.explanations.ExplanationsUtils.ContextInfo] .. py:method:: add_frequency_pct_to_df(df, group_by) Add a frequency percentage column to the dataframe based on the total frequency per group.