pdstools.explanations.Explanations

Classes

Explanations

Process and explore explanation data for Adaptive Gradient Boost models.

Module Contents

class Explanations(*, model_name: str | None = None, from_date: datetime.datetime | None = None, to_date: datetime.datetime | None = None)

Process and explore explanation data for Adaptive Gradient Boost models.

The class is a thin orchestrator over four sub-namespaces (preprocess, aggregate, plot, report, filter) that operate on the parquet files produced by Pega’s explanation file repository.

The constructor is pure configuration — it takes no filesystem paths and performs no I/O. Use the from_local_directory classmethod to load raw explanation parquet files from disk (or a remote URL) and run the DuckDB aggregation step. After that, aggregate, plot and report can be used freely.

Parameters:
  • model_name (str, optional) – Name of the model rule. Used to identify and validate raw explanation parquet files when loading via from_local_directory().

  • from_date (datetime, optional) – Start date of the period over which aggregates are computed. Defaults to to_date - 7 days if only to_date is given, or to today() - 7 days if both are omitted.

  • to_date (datetime, optional) – End date of the period over which aggregates are computed. Defaults to today() if only from_date is given, or to today() if both are omitted.

See also

Explanations.from_local_directory

Load raw explanation parquet files from a local folder or remote URL and pre-aggregate them.

Notes

Environment variables that influence the (lazy) DuckDB aggregation step:

MODEL_CONTEXT_LIMIT

Maximum number of unique contexts processed in a single query. Default: 2500.

QUERY_BATCH_LIMIT

Number of contexts per DuckDB batch query. Default: 10.

FILE_BATCH_LIMIT

Number of files per DuckDB batch. Default: 10.

MEMORY_LIMIT

DuckDB buffer memory limit in GB. Default: 8.

THREAD_COUNT

Number of DuckDB worker threads. Default: 4.

PROGRESS_BAR

"1" to enable the DuckDB progress bar. Default: disabled.

Examples

Load and explore a folder of raw explanation parquet files:

>>> from datetime import datetime
>>> exp = Explanations.from_local_directory(
...     data_folder="explanations_data",
...     model_name="AdaptiveBoostCT",
...     from_date=datetime(2025, 3, 28),
...     to_date=datetime(2025, 3, 28),
... )
>>> df = exp.aggregate.get_df_overall().collect()

Load a single remote parquet file:

>>> exp = Explanations.from_local_directory(
...     data_file="https://example.com/AdaptiveBoostCT_20250328.parquet",
...     model_name="AdaptiveBoostCT",
... )

Construct without I/O (e.g. inside a Quarto report that already points aggregate.data_folderpath at pre-aggregated parquet files):

>>> exp = Explanations()
>>> exp.aggregate.data_folderpath = "/path/to/aggregated_data"
classmethod from_local_directory(root_dir: str = _DEFAULT_ROOT_DIR, data_folder: str = _DEFAULT_DATA_FOLDER, data_file: str | None = None, *, model_name: str | None = None, from_date: datetime.datetime | None = None, to_date: datetime.datetime | None = None) Explanations

Construct an Explanations from raw parquet files on disk or a URL.

This is the standard entry point: it wires the path configuration, runs the DuckDB pre-aggregation step (writing the per-context and per-overall aggregates to <root_dir>/aggregated_data/) and returns a ready-to-query instance.

Parameters:
  • root_dir (str, default ".tmp") – Working directory under which the pre-aggregated parquet files (and report scratch space) are written.

  • data_folder (str, default "explanations_data") – Folder containing the raw model-explanation parquet files downloaded from the Pega explanation file repository. Used when data_file is not provided.

  • data_file (str, optional) – Direct path or URL to a single explanation parquet file. When given, takes precedence over data_folder. http:// and https:// URLs are downloaded into root_dir before aggregation.

  • model_name (str, optional) – Name of the model rule. Used to filter files in data_folder and validate that the correct files are being processed.

  • from_date (datetime, optional) – Start date of the period over which aggregates are collected. See Explanations for default behaviour.

  • to_date (datetime, optional) – End date of the period over which aggregates are collected. See Explanations for default behaviour.

Returns:

A fully initialised instance with pre-aggregation completed.

Return type:

Explanations

Raises:

ValueError – If from_date > to_date, if no files match model_name within the date range, or if a remote data_file cannot be downloaded.