pdstools.explanations.Preprocess

Classes

Preprocess

Preprocess.

Module Contents

class Preprocess(explanations: pdstools.explanations.Explanations.Explanations)

Bases: pdstools.utils.namespaces.LazyNamespace

Preprocess.

Parameters:

explanations (pdstools.explanations.Explanations.Explanations)

dependencies: ClassVar[list[str]] = ['duckdb', 'polars']
dependency_group = 'explanations'
SEP = ', '
LEFT_PREFIX = 'l'
RIGHT_PREFIX = 'r'
explanations
explanations_folder
data_file
data_foldername = 'aggregated_data'
data_folderpath
from_date
to_date
model_name
model_context_limit
query_batch_limit
file_batch_limit
memory_limit
thread_count
progress_bar
selected_files: list[str] = []
contexts: dict[str, dict[str, list[str]]] | None = None
unique_contexts_filename = 'Instance of pathlib.Path/unique_contexts.json'
generate()

Process explanation parquet files and save calculated aggregates.

This method reads the explanation data from the provided location and creates aggregates for multiple contexts which are used to create global explanation plots.

The different context aggregates are as follows: i) Overall Numeric Predictor Contributions

The average contribution towards predicted model propensity for each numeric predictor value decile.

  1. Overal Symbolic Predictor Contributions The average contribution towards predicted model propensity for each symoblic predictor value.

  2. Context Specific Numeric Predictor Contributions

The average contribution towards predicted model propensity for each numeric predictor value decile, grouped by context key partition.

  1. Overal Symbolic Predictor Contributions The average contribution towards predicted model propensity for each symoblic predictor value, grouped by context key partition.

Each of the aggregates are written to parquet files to a temporary output dirtectory