pdstools.explanations.Explanations¶
Classes¶
Process and explore explanation data for Adaptive Gradient Boost models. |
Module Contents¶
- class Explanations(*, model_name: str | None = None, from_date: datetime.datetime | None = None, to_date: datetime.datetime | None = None)¶
Process and explore explanation data for Adaptive Gradient Boost models.
The class is a thin orchestrator over four sub-namespaces (
preprocess,aggregate,plot,report,filter) that operate on the parquet files produced by Pega’s explanation file repository.The constructor is pure configuration — it takes no filesystem paths and performs no I/O. Use the
from_local_directoryclassmethod to load raw explanation parquet files from disk (or a remote URL) and run the DuckDB aggregation step. After that,aggregate,plotandreportcan be used freely.- Parameters:
model_name (str, optional) – Name of the model rule. Used to identify and validate raw explanation parquet files when loading via
from_local_directory().from_date (datetime, optional) – Start date of the period over which aggregates are computed. Defaults to
to_date - 7 daysif onlyto_dateis given, or totoday() - 7 daysif both are omitted.to_date (datetime, optional) – End date of the period over which aggregates are computed. Defaults to
today()if onlyfrom_dateis given, or totoday()if both are omitted.
See also
Explanations.from_local_directoryLoad raw explanation parquet files from a local folder or remote URL and pre-aggregate them.
Notes
Environment variables that influence the (lazy) DuckDB aggregation step:
MODEL_CONTEXT_LIMITMaximum number of unique contexts processed in a single query. Default:
2500.QUERY_BATCH_LIMITNumber of contexts per DuckDB batch query. Default:
10.FILE_BATCH_LIMITNumber of files per DuckDB batch. Default:
10.MEMORY_LIMITDuckDB buffer memory limit in GB. Default:
8.THREAD_COUNTNumber of DuckDB worker threads. Default:
4.PROGRESS_BAR"1"to enable the DuckDB progress bar. Default: disabled.
Examples
Load and explore a folder of raw explanation parquet files:
>>> from datetime import datetime >>> exp = Explanations.from_local_directory( ... data_folder="explanations_data", ... model_name="AdaptiveBoostCT", ... from_date=datetime(2025, 3, 28), ... to_date=datetime(2025, 3, 28), ... ) >>> df = exp.aggregate.get_df_overall().collect()
Load a single remote parquet file:
>>> exp = Explanations.from_local_directory( ... data_file="https://example.com/AdaptiveBoostCT_20250328.parquet", ... model_name="AdaptiveBoostCT", ... )
Construct without I/O (e.g. inside a Quarto report that already points
aggregate.data_folderpathat pre-aggregated parquet files):>>> exp = Explanations() >>> exp.aggregate.data_folderpath = "/path/to/aggregated_data"
- classmethod from_local_directory(root_dir: str = _DEFAULT_ROOT_DIR, data_folder: str = _DEFAULT_DATA_FOLDER, data_file: str | None = None, *, model_name: str | None = None, from_date: datetime.datetime | None = None, to_date: datetime.datetime | None = None) Explanations¶
Construct an
Explanationsfrom raw parquet files on disk or a URL.This is the standard entry point: it wires the path configuration, runs the DuckDB pre-aggregation step (writing the per-context and per-overall aggregates to
<root_dir>/aggregated_data/) and returns a ready-to-query instance.- Parameters:
root_dir (str, default ".tmp") – Working directory under which the pre-aggregated parquet files (and report scratch space) are written.
data_folder (str, default "explanations_data") – Folder containing the raw model-explanation parquet files downloaded from the Pega explanation file repository. Used when
data_fileis not provided.data_file (str, optional) – Direct path or URL to a single explanation parquet file. When given, takes precedence over
data_folder.http://andhttps://URLs are downloaded intoroot_dirbefore aggregation.model_name (str, optional) – Name of the model rule. Used to filter files in
data_folderand validate that the correct files are being processed.from_date (datetime, optional) – Start date of the period over which aggregates are collected. See
Explanationsfor default behaviour.to_date (datetime, optional) – End date of the period over which aggregates are collected. See
Explanationsfor default behaviour.
- Returns:
A fully initialised instance with pre-aggregation completed.
- Return type:
- Raises:
ValueError – If
from_date > to_date, if no files matchmodel_namewithin the date range, or if a remotedata_filecannot be downloaded.