pdstools.explanations.Explanations¶

Classes¶

Explanations

Process and explore explanation data for Adaptive Gradient Boost models.

Module Contents¶

class Explanations(root_dir: str = '.tmp', data_folder: str = 'explanations_data', data_file: str | None = None, model_name: str | None = '', from_date: datetime.datetime | None = None, to_date: datetime.datetime | None = None)¶

Process and explore explanation data for Adaptive Gradient Boost models.

Class is initialied with data location, which should point to the location of the model’s explanation parquet files downloaded from the explanations file repository. These parquet files can then be processed to create aggregates to explain the contribution of different predictors on a global level.

Parameters:

data_folder (str) – The path of the folder containing the model explanation parquet files for processing.
data_file (str, optional) – Direct path to a single explanation file (URL or local path). When provided, this takes precedence over data_folder. Useful for loading files from remote locations.
model_name (str, optional) – The name of the model rule. Will be used to identify files in the data folder and to validate that the correct files are being processed.
end_date (datetime, optional, default = datetime.today()) – Defines the end date of the duration over which aggregates will be collected.
start_date (datetime, optional, default = end_date - timedelta(7)) – Defines the start date of the duration over which aggregaates wille be collected.
variables (Environment)
-------------------
BATCH_LIMIT (int) – The maximum number of unique contexts to process in a single batch. Default is 10.
MEMORY_LIMIT (int) – Set the memory limit for the duckdb buffer manager. If not set will use 80% of RAM. Default is 2(in GB).
THREAD_COUNT (int) – Set the amount of threads for duck db parallel query execution. Default is 4.
PROGRESS_BAR (int) – Show progress bar when running duckdb queries. 0 = no progress bar, 1 = show progress bar. Default is 0.
root_dir (str)
from_date (Optional[datetime.datetime])
to_date (Optional[datetime.datetime])

root_dir = '.tmp'¶

data_folder = 'explanations_data'¶

data_file = None¶

model_name = ''¶

from_date = None¶

to_date = None¶

preprocess¶

aggregate¶

plot¶

report¶

filter¶

_set_date_range(from_date: datetime.datetime | None, to_date: datetime.datetime | None, days: int = 7)¶

Set the date range for processing explanation files.

Parameters:

start_date (datetime, optional) – The start date for the date range. If None, defaults to 7 days before end_date.
end_date (datetime, optional) – The end date for the date range. If None, defaults to today.
from_date (Optional[datetime.datetime])
to_date (Optional[datetime.datetime])
days (int)