pdstools.adm.trees._model

The ADMTreesModel — load, score, plot, and analyse one AGB model.

Attributes

Classes

ADMTreesModel

Functions for ADM Gradient boosting

Module Contents

logger
class ADMTreesModel(trees: dict, model: list[dict], *, raw_input: Any = None, properties: dict[str, Any] | None = None, learning_rate: float | None = None, context_keys: list | None = None)

Functions for ADM Gradient boosting

ADM Gradient boosting models consist of multiple trees, which build upon each other in a ‘boosting’ fashion. This class provides functions to extract data from these trees: the features on which the trees split, important values for these features, statistics about the trees, or visualising each individual tree.

Construct via from_file(), from_url(), from_datamart_blob(), or from_dict().

Notes

The “save model” action in Prediction Studio exports a JSON file that this class can load directly. The Datamart’s pyModelData column also contains this information, but compressed and with encoded split values; the “save model” button decompresses and decodes that data.

Parameters:
  • trees (dict)

  • model (list[dict])

  • raw_input (Any)

  • properties (dict[str, Any] | None)

  • learning_rate (float | None)

  • context_keys (list | None)

trees: dict

The full parsed model JSON.

model: list[dict]

The list of boosted trees (each a nested dict).

raw_input: Any

The raw input used to construct this instance (path, bytes, or dict).

learning_rate: float | None = None
context_keys: list | None = None
_properties: dict[str, Any]
classmethod from_dict(data: dict, *, context_keys: list | None = None) ADMTreesModel

Build from an already-parsed model dict.

Parameters:
  • data (dict)

  • context_keys (list | None)

Return type:

ADMTreesModel

classmethod from_file(path: str | pathlib.Path, *, context_keys: list | None = None) ADMTreesModel

Load a model from a local JSON file (Prediction Studio “save model” output).

Parameters:
Return type:

ADMTreesModel

classmethod from_url(url: str, *, timeout: float = 30.0, context_keys: list | None = None) ADMTreesModel

Load a model from a URL pointing at the JSON export.

timeout is the per-request timeout in seconds (default 30).

Parameters:
Return type:

ADMTreesModel

classmethod from_datamart_blob(blob: str | bytes, *, context_keys: list | None = None) ADMTreesModel

Load from a base64-encoded zlib-compressed datamart Modeldata blob.

Parameters:
Return type:

ADMTreesModel

_decode_trees()
_post_import_cleanup(decode: bool, *, context_keys: list | None = None)
Parameters:
  • decode (bool)

  • context_keys (list | None)

_locate_boosters() list[dict]

Find the boosters/trees list in the model JSON.

Different Pega versions place the boosters at different paths; try them in order.

Return type:

list[dict]

static _safe_numeric_compare(left: float, operator: str, right: float) bool

Safely compare two numeric values without using eval().

Parameters:
Return type:

bool

_safe_eval_seen_errors: set[tuple[str, str]]
_safe_eval_lock: threading.Lock
static _safe_condition_evaluate(value: Any, operator: str, comparison_set: set | float | str | frozenset) bool

Safely evaluate split conditions without using eval().

Returns False on type-conversion errors after logging the first occurrence per (operator, error-type) pair at INFO level. Subsequent matching failures log at DEBUG only — we don’t want per-row scoring to swamp the application logs, but the first failure for each error class is worth surfacing.

Parameters:
Return type:

bool

property metrics: dict[str, Any]

Compute CDH_ADM005-style diagnostic metrics for this model.

Returns a flat dictionary of key/value pairs aligned with the CDH_ADM005 telemetry event specification. Metrics that cannot be computed from an exported model (e.g. saturation counts that require bin-level data) are omitted.

See metric_descriptions() for human-readable descriptions of every key.

Return type:

dict[str, Any]

static metric_descriptions() dict[str, str]

Return a dictionary mapping metric names to human-readable descriptions.

Return type:

dict[str, str]

static _classify_predictor(name: str) str

Classify a predictor as ‘ih’, ‘context_key’, or ‘other’.

Parameters:

name (str)

Return type:

str

_compute_metrics() dict[str, Any]

Walk all trees once and assemble the metrics dictionary.

Return type:

dict[str, Any]

_ENCODER_PATHS: tuple[tuple[str, Ellipsis], Ellipsis] = (('model', 'inputsEncoder', 'encoders'), ('model', 'model', 'inputsEncoder', 'encoders'))
_get_encoder_info() dict[str, dict[str, Any]] | None

Extract predictor metadata from the inputsEncoder if present.

Returns None when no encoder metadata is available (e.g. for exported/decoded models).

Return type:

dict[str, dict[str, Any]] | None

property predictors: dict[str, str] | None
Return type:

dict[str, str] | None

property tree_stats: polars.DataFrame
Return type:

polars.DataFrame

property splits_per_tree: dict[int, list[str]]
Return type:

dict[int, list[str]]

property gains_per_tree: dict[int, list[float]]
Return type:

dict[int, list[float]]

property gains_per_split: polars.DataFrame
Return type:

polars.DataFrame

property grouped_gains_per_split: polars.DataFrame
Return type:

polars.DataFrame

property all_values_per_split: dict[str, set]
Return type:

dict[str, set]

property splits_per_variable_type: tuple[list[collections.Counter], list[float]]

Per-tree counts of splits grouped by predictor category.

Equivalent to calling compute_categorization_over_time() with no arguments.

Return type:

tuple[list[collections.Counter], list[float]]

get_predictors() dict[str, str] | None

Extract predictor names and types from model metadata.

Tries explicit metadata first (configuration.predictors then predictors); falls back to inferring from tree splits when neither is present.

Return type:

dict[str, str] | None

_infer_predictors_from_splits() dict[str, str] | None

Infer predictor names + types by walking all tree splits.

Return type:

dict[str, str] | None

property _splits_and_gains: tuple[dict[int, list[str]], dict[int, list[float]], polars.DataFrame]

Compute (splits_per_tree, gains_per_tree, gains_per_split) once.

Backs the public splits_per_tree / gains_per_tree / gains_per_split properties via a single tree-walk per tree. Implemented as a cached_property rather than @lru_cache because lru_cache holds a strong reference to self and would leak the entire ADMTreesModel instance for the lifetime of the cache.

Zero-gain splits are kept (with gains == 0.0) in gains_per_split so the per-split DataFrame is always aligned with splits_per_tree. gains_per_tree continues to keep only positive gains for backward compatibility.

Return type:

tuple[dict[int, list[str]], dict[int, list[float]], polars.DataFrame]

get_grouped_gains_per_split() polars.DataFrame

Gains per split, grouped by split string with helpful aggregates.

Return type:

polars.DataFrame

plot_splits_per_variable(subset: set | None = None, show: bool = True)

Box-plot of gains per split for each variable.

Parameters:
get_tree_stats() polars.DataFrame

Generate a dataframe with useful stats for each tree.

Return type:

polars.DataFrame

get_all_values_per_split() dict[str, set]

All distinct split values seen for each predictor.

Return type:

dict[str, set]

get_tree_representation(tree_number: int) dict[int, dict]

Build a flat node-id-keyed representation of one tree.

Walks self.model[tree_number] in pre-order (left subtree before right) and returns a dict keyed by 1-based node id.

Each entry has score; internal nodes additionally carry split, gain, left_child and right_child; non-root nodes carry parent_node.

This replaces an earlier implementation that mutated three accumulator parameters and relied on a final del to drop a spurious trailing entry.

Parameters:

tree_number (int)

Return type:

dict[int, dict]

plot_tree(tree_number: int, highlighted: dict | list | None = None, show: bool = True) pydot.Graph

Plot the chosen decision tree.

Parameters:
Return type:

pydot.Graph

get_visited_nodes(treeID: int, x: dict, save_all: bool = False) tuple[list, float, list]

Trace the path through one tree for the given feature values.

Parameters:
Return type:

tuple[list, float, list]

get_all_visited_nodes(x: dict) polars.DataFrame

Score every tree against x and return per-tree visit info.

Parameters:

x (dict)

Return type:

polars.DataFrame

score(x: dict) float

Compute the (sigmoid-normalised) propensity score for x.

Calls get_visited_nodes() per tree and sums the resulting leaf scores; avoids building the full per-tree DataFrame that get_all_visited_nodes() would produce.

Parameters:

x (dict)

Return type:

float

plot_contribution_per_tree(x: dict, show: bool = True)

Plot the per-tree contribution toward the final propensity.

Parameters:
predictor_categorization(x: str, context_keys: list | None = None) str

Default predictor categorisation function.

Parameters:
  • x (str)

  • context_keys (list | None)

Return type:

str

compute_categorization_over_time(predictor_categorization: collections.abc.Callable | None = None, context_keys: list | None = None) tuple[list[collections.Counter], list[float]]

Per-tree categorisation counts plus per-tree absolute scores.

Parameters:
Return type:

tuple[list[collections.Counter], list[float]]

plot_splits_per_variable_type(predictor_categorization: collections.abc.Callable | None = None, **kwargs)

Stacked-area chart of categorised split counts per tree.

Parameters:

predictor_categorization (collections.abc.Callable | None)