pdstools.adm.trees._model¶
The ADMTreesModel — load, score, plot, and analyse one AGB model.
Attributes¶
Classes¶
Functions for ADM Gradient boosting |
Module Contents¶
- logger¶
- class ADMTreesModel(trees: dict, model: list[dict], *, raw_input: Any = None, properties: dict[str, Any] | None = None, learning_rate: float | None = None, context_keys: list | None = None)¶
Functions for ADM Gradient boosting
ADM Gradient boosting models consist of multiple trees, which build upon each other in a ‘boosting’ fashion. This class provides functions to extract data from these trees: the features on which the trees split, important values for these features, statistics about the trees, or visualising each individual tree.
Construct via
from_file(),from_url(),from_datamart_blob(), orfrom_dict().Notes
The “save model” action in Prediction Studio exports a JSON file that this class can load directly. The Datamart’s
pyModelDatacolumn also contains this information, but compressed and with encoded split values; the “save model” button decompresses and decodes that data.- Parameters:
- raw_input: Any¶
The raw input used to construct this instance (path, bytes, or dict).
- classmethod from_dict(data: dict, *, context_keys: list | None = None) ADMTreesModel¶
Build from an already-parsed model dict.
- Parameters:
- Return type:
- classmethod from_file(path: str | pathlib.Path, *, context_keys: list | None = None) ADMTreesModel¶
Load a model from a local JSON file (Prediction Studio “save model” output).
- Parameters:
path (str | pathlib.Path)
context_keys (list | None)
- Return type:
- classmethod from_url(url: str, *, timeout: float = 30.0, context_keys: list | None = None) ADMTreesModel¶
Load a model from a URL pointing at the JSON export.
timeoutis the per-request timeout in seconds (default 30).- Parameters:
- Return type:
- classmethod from_datamart_blob(blob: str | bytes, *, context_keys: list | None = None) ADMTreesModel¶
Load from a base64-encoded zlib-compressed datamart
Modeldatablob.- Parameters:
- Return type:
- _decode_trees()¶
- _locate_boosters() list[dict]¶
Find the boosters/trees list in the model JSON.
Different Pega versions place the boosters at different paths; try them in order.
- static _safe_numeric_compare(left: float, operator: str, right: float) bool¶
Safely compare two numeric values without using
eval().
- _safe_eval_lock: threading.Lock¶
- static _safe_condition_evaluate(value: Any, operator: str, comparison_set: set | float | str | frozenset) bool¶
Safely evaluate split conditions without using
eval().Returns
Falseon type-conversion errors after logging the first occurrence per (operator, error-type) pair at INFO level. Subsequent matching failures log at DEBUG only — we don’t want per-row scoring to swamp the application logs, but the first failure for each error class is worth surfacing.
- property metrics: dict[str, Any]¶
Compute CDH_ADM005-style diagnostic metrics for this model.
Returns a flat dictionary of key/value pairs aligned with the CDH_ADM005 telemetry event specification. Metrics that cannot be computed from an exported model (e.g. saturation counts that require bin-level data) are omitted.
See
metric_descriptions()for human-readable descriptions of every key.
- static metric_descriptions() dict[str, str]¶
Return a dictionary mapping metric names to human-readable descriptions.
- _ENCODER_PATHS: tuple[tuple[str, Ellipsis], Ellipsis] = (('model', 'inputsEncoder', 'encoders'), ('model', 'model', 'inputsEncoder', 'encoders'))¶
- _get_encoder_info() dict[str, dict[str, Any]] | None¶
Extract predictor metadata from the inputsEncoder if present.
Returns
Nonewhen no encoder metadata is available (e.g. for exported/decoded models).
- property tree_stats: polars.DataFrame¶
- Return type:
polars.DataFrame
- property gains_per_split: polars.DataFrame¶
- Return type:
polars.DataFrame
- property grouped_gains_per_split: polars.DataFrame¶
- Return type:
polars.DataFrame
- property splits_per_variable_type: tuple[list[collections.Counter], list[float]]¶
Per-tree counts of splits grouped by predictor category.
Equivalent to calling
compute_categorization_over_time()with no arguments.- Return type:
- get_predictors() dict[str, str] | None¶
Extract predictor names and types from model metadata.
Tries explicit metadata first (
configuration.predictorsthenpredictors); falls back to inferring from tree splits when neither is present.
- _infer_predictors_from_splits() dict[str, str] | None¶
Infer predictor names + types by walking all tree splits.
- property _splits_and_gains: tuple[dict[int, list[str]], dict[int, list[float]], polars.DataFrame]¶
Compute (splits_per_tree, gains_per_tree, gains_per_split) once.
Backs the public
splits_per_tree/gains_per_tree/gains_per_splitproperties via a single tree-walk per tree. Implemented as acached_propertyrather than@lru_cachebecause lru_cache holds a strong reference toselfand would leak the entire ADMTreesModel instance for the lifetime of the cache.Zero-gain splits are kept (with
gains == 0.0) ingains_per_splitso the per-split DataFrame is always aligned withsplits_per_tree.gains_per_treecontinues to keep only positive gains for backward compatibility.
- get_grouped_gains_per_split() polars.DataFrame¶
Gains per split, grouped by split string with helpful aggregates.
- Return type:
polars.DataFrame
- plot_splits_per_variable(subset: set | None = None, show: bool = True)¶
Box-plot of gains per split for each variable.
- get_tree_stats() polars.DataFrame¶
Generate a dataframe with useful stats for each tree.
- Return type:
polars.DataFrame
- get_tree_representation(tree_number: int) dict[int, dict]¶
Build a flat node-id-keyed representation of one tree.
Walks
self.model[tree_number]in pre-order (left subtree before right) and returns a dict keyed by 1-based node id.Each entry has
score; internal nodes additionally carrysplit,gain,left_childandright_child; non-root nodes carryparent_node.This replaces an earlier implementation that mutated three accumulator parameters and relied on a final
delto drop a spurious trailing entry.
- plot_tree(tree_number: int, highlighted: dict | list | None = None, show: bool = True) pydot.Graph¶
Plot the chosen decision tree.
- get_visited_nodes(treeID: int, x: dict, save_all: bool = False) tuple[list, float, list]¶
Trace the path through one tree for the given feature values.
- get_all_visited_nodes(x: dict) polars.DataFrame¶
Score every tree against
xand return per-tree visit info.- Parameters:
x (dict)
- Return type:
polars.DataFrame
- score(x: dict) float¶
Compute the (sigmoid-normalised) propensity score for
x.Calls
get_visited_nodes()per tree and sums the resulting leaf scores; avoids building the full per-tree DataFrame thatget_all_visited_nodes()would produce.
- plot_contribution_per_tree(x: dict, show: bool = True)¶
Plot the per-tree contribution toward the final propensity.
- predictor_categorization(x: str, context_keys: list | None = None) str¶
Default predictor categorisation function.
- compute_categorization_over_time(predictor_categorization: collections.abc.Callable | None = None, context_keys: list | None = None) tuple[list[collections.Counter], list[float]]¶
Per-tree categorisation counts plus per-tree absolute scores.
- Parameters:
predictor_categorization (collections.abc.Callable | None)
context_keys (list | None)
- Return type:
- plot_splits_per_variable_type(predictor_categorization: collections.abc.Callable | None = None, **kwargs)¶
Stacked-area chart of categorised split counts per tree.
- Parameters:
predictor_categorization (collections.abc.Callable | None)