pdstools.adm.trees._model¶

The ADMTreesModel — load, score, plot, and analyse one AGB model.

Attributes¶

logger

Classes¶

ADMTreesModel

Functions for ADM Gradient boosting

Module Contents¶

logger¶

class ADMTreesModel(trees: dict, model: list[dict], *, raw_input: Any = None, properties: dict[str, Any] | None = None, learning_rate: float | None = None, context_keys: list | None = None)¶

Functions for ADM Gradient boosting

ADM Gradient boosting models consist of multiple trees, which build upon each other in a ‘boosting’ fashion. This class provides functions to extract data from these trees: the features on which the trees split, important values for these features, statistics about the trees, or visualising each individual tree.

Construct via from_file(), from_url(), from_datamart_blob(), or from_dict().

Notes

The “save model” action in Prediction Studio exports a JSON file that this class can load directly. The Datamart’s pyModelData column also contains this information, but compressed and with encoded split values; the “save model” button decompresses and decodes that data.

Parameters:

trees (dict)
model (list[dict])
raw_input (Any)
properties (dict[str, Any] | None)
learning_rate (float | None)
context_keys (list | None)

trees: dict¶: The full parsed model JSON.

model: list[dict]¶: The list of boosted trees (each a nested dict).

raw_input: Any¶: The raw input used to construct this instance (path, bytes, or dict).

learning_rate: float | None = None¶

context_keys: list | None = None¶

_properties: dict[str, Any]¶

classmethod from_dict(data: dict, *, context_keys: list | None = None) → ADMTreesModel¶

Build from an already-parsed model dict.

Parameters:

data (dict)
context_keys (list | None)

Return type:

ADMTreesModel

classmethod from_file(path: str | pathlib.Path, *, context_keys: list | None = None) → ADMTreesModel¶

Load a model from a local JSON file (Prediction Studio “save model” output).

Parameters:

path (str | pathlib.Path)
context_keys (list | None)

Return type:

ADMTreesModel

classmethod from_url(url: str, *, timeout: float = 30.0, context_keys: list | None = None) → ADMTreesModel¶

Load a model from a URL pointing at the JSON export.

timeout is the per-request timeout in seconds (default 30).

Parameters:

url (str)
timeout (float)
context_keys (list | None)

Return type:

ADMTreesModel

classmethod from_datamart_blob(blob: str | bytes, *, context_keys: list | None = None) → ADMTreesModel¶

Load from a base64-encoded zlib-compressed datamart Modeldata blob.

Parameters:

blob (str | bytes)
context_keys (list | None)

Return type:

ADMTreesModel

_decode_trees()¶

_post_import_cleanup(decode: bool, *, context_keys: list | None = None)¶

Parameters:

decode (bool)
context_keys (list | None)

_locate_boosters() → list[dict]¶

Find the boosters/trees list in the model JSON.

Different Pega versions place the boosters at different paths; try them in order.

Return type:: list[dict]

static _safe_numeric_compare(left: float, operator: str, right: float) → bool¶

Safely compare two numeric values without using eval().

Parameters:

left (float)
operator (str)
right (float)

Return type:

bool

_safe_eval_seen_errors: set[tuple[str, str]]¶

_safe_eval_lock: threading.Lock¶

static _safe_condition_evaluate(value: Any, operator: str, comparison_set: set | float | str | frozenset) → bool¶

Safely evaluate split conditions without using eval().

Returns False on type-conversion errors after logging the first occurrence per (operator, error-type) pair at INFO level. Subsequent matching failures log at DEBUG only — we don’t want per-row scoring to swamp the application logs, but the first failure for each error class is worth surfacing.

Parameters:

value (Any)
operator (str)
comparison_set (set | float | str | frozenset)

Return type:

bool

property metrics: dict[str, Any]¶

Compute CDH_ADM005-style diagnostic metrics for this model.

Returns a flat dictionary of key/value pairs aligned with the CDH_ADM005 telemetry event specification. Metrics that cannot be computed from an exported model (e.g. saturation counts that require bin-level data) are omitted.

See metric_descriptions() for human-readable descriptions of every key.

Return type:: dict[str, Any]

static metric_descriptions() → dict[str, str]¶

Return a dictionary mapping metric names to human-readable descriptions.

Return type:: dict[str, str]

static _classify_predictor(name: str) → str¶

Classify a predictor as ‘ih’, ‘context_key’, or ‘other’.

Parameters:: name (str)
Return type:: str

_compute_metrics() → dict[str, Any]¶

Walk all trees once and assemble the metrics dictionary.

Return type:: dict[str, Any]

_ENCODER_PATHS: tuple[tuple[str, Ellipsis], Ellipsis] = (('model', 'inputsEncoder', 'encoders'), ('model', 'model', 'inputsEncoder', 'encoders'))¶

_get_encoder_info() → dict[str, dict[str, Any]] | None¶

Extract predictor metadata from the inputsEncoder if present.

Returns None when no encoder metadata is available (e.g. for exported/decoded models).

Return type:: dict[str, dict[str, Any]] | None

property predictors: dict[str, str] | None¶

Return type:: dict[str, str] | None

property tree_stats: polars.DataFrame¶

Return type:: polars.DataFrame

property splits_per_tree: dict[int, list[str]]¶

Return type:: dict[int, list[str]]

property gains_per_tree: dict[int, list[float]]¶

Return type:: dict[int, list[float]]

property gains_per_split: polars.DataFrame¶

Return type:: polars.DataFrame

property grouped_gains_per_split: polars.DataFrame¶

Return type:: polars.DataFrame

property all_values_per_split: dict[str, set]¶

Return type:: dict[str, set]

property splits_per_variable_type: tuple[list[collections.Counter], list[float]]¶

Per-tree counts of splits grouped by predictor category.

Equivalent to calling compute_categorization_over_time() with no arguments.

Return type:: tuple[list[collections.Counter], list[float]]

get_predictors() → dict[str, str] | None¶

Extract predictor names and types from model metadata.

Tries explicit metadata first (configuration.predictors then predictors); falls back to inferring from tree splits when neither is present.

Return type:: dict[str, str] | None

_infer_predictors_from_splits() → dict[str, str] | None¶

Infer predictor names + types by walking all tree splits.

Return type:: dict[str, str] | None

property _splits_and_gains: tuple[dict[int, list[str]], dict[int, list[float]], polars.DataFrame]¶

Compute (splits_per_tree, gains_per_tree, gains_per_split) once.

Backs the public splits_per_tree / gains_per_tree / gains_per_split properties via a single tree-walk per tree. Implemented as a cached_property rather than @lru_cache because lru_cache holds a strong reference to self and would leak the entire ADMTreesModel instance for the lifetime of the cache.

Zero-gain splits are kept (with gains == 0.0) in gains_per_split so the per-split DataFrame is always aligned with splits_per_tree. gains_per_tree continues to keep only positive gains for backward compatibility.

Return type:: tuple[dict[int, list[str]], dict[int, list[float]], polars.DataFrame]

get_grouped_gains_per_split() → polars.DataFrame¶

Gains per split, grouped by split string with helpful aggregates.

Return type:: polars.DataFrame

plot_splits_per_variable(subset: set | None = None, show: bool = True)¶

Box-plot of gains per split for each variable.

Parameters:

subset (set | None)
show (bool)

get_tree_stats() → polars.DataFrame¶

Generate a dataframe with useful stats for each tree.

Return type:: polars.DataFrame

get_all_values_per_split() → dict[str, set]¶

All distinct split values seen for each predictor.

Return type:: dict[str, set]

get_tree_representation(tree_number: int) → dict[int, dict]¶

Build a flat node-id-keyed representation of one tree.

Walks self.model[tree_number] in pre-order (left subtree before right) and returns a dict keyed by 1-based node id.

Each entry has score; internal nodes additionally carry split, gain, left_child and right_child; non-root nodes carry parent_node.

This replaces an earlier implementation that mutated three accumulator parameters and relied on a final del to drop a spurious trailing entry.

Parameters:: tree_number (int)
Return type:: dict[int, dict]

plot_tree(tree_number: int, highlighted: dict | list | None = None, show: bool = True) → pydot.Graph¶

Plot the chosen decision tree.

Parameters:

tree_number (int)
highlighted (dict | list | None)
show (bool)

Return type:

pydot.Graph

get_visited_nodes(treeID: int, x: dict, save_all: bool = False) → tuple[list, float, list]¶

Trace the path through one tree for the given feature values.

Parameters:

treeID (int)
x (dict)
save_all (bool)

Return type:

tuple[list, float, list]

get_all_visited_nodes(x: dict) → polars.DataFrame¶

Score every tree against x and return per-tree visit info.

Parameters:: x (dict)
Return type:: polars.DataFrame

score(x: dict) → float¶

Compute the (sigmoid-normalised) propensity score for x.

Calls get_visited_nodes() per tree and sums the resulting leaf scores; avoids building the full per-tree DataFrame that get_all_visited_nodes() would produce.

Parameters:: x (dict)
Return type:: float

plot_contribution_per_tree(x: dict, show: bool = True)¶

Plot the per-tree contribution toward the final propensity.

Parameters:

x (dict)
show (bool)

predictor_categorization(x: str, context_keys: list | None = None) → str¶

Default predictor categorisation function.

Parameters:

x (str)
context_keys (list | None)

Return type:

str

compute_categorization_over_time(predictor_categorization: collections.abc.Callable | None = None, context_keys: list | None = None) → tuple[list[collections.Counter], list[float]]¶

Per-tree categorisation counts plus per-tree absolute scores.

Parameters:

predictor_categorization (collections.abc.Callable | None)
context_keys (list | None)

Return type:

tuple[list[collections.Counter], list[float]]

plot_splits_per_variable_type(predictor_categorization: collections.abc.Callable | None = None, **kwargs)¶

Stacked-area chart of categorised split counts per tree.

Parameters:: predictor_categorization (collections.abc.Callable | None)