pdstools.adm.trees._model ========================= .. py:module:: pdstools.adm.trees._model .. autoapi-nested-parse:: The :class:`ADMTreesModel` — load, score, plot, and analyse one AGB model. Attributes ---------- .. autoapisummary:: pdstools.adm.trees._model.logger Classes ------- .. autoapisummary:: pdstools.adm.trees._model.ADMTreesModel Module Contents --------------- .. py:data:: logger .. py:class:: ADMTreesModel(trees: dict, model: list[dict], *, raw_input: Any = None, properties: dict[str, Any] | None = None, learning_rate: float | None = None, context_keys: list | None = None) Functions for ADM Gradient boosting ADM Gradient boosting models consist of multiple trees, which build upon each other in a 'boosting' fashion. This class provides functions to extract data from these trees: the features on which the trees split, important values for these features, statistics about the trees, or visualising each individual tree. Construct via :meth:`from_file`, :meth:`from_url`, :meth:`from_datamart_blob`, or :meth:`from_dict`. .. rubric:: Notes The "save model" action in Prediction Studio exports a JSON file that this class can load directly. The Datamart's ``pyModelData`` column also contains this information, but compressed and with encoded split values; the "save model" button decompresses and decodes that data. .. py:attribute:: trees :type: dict The full parsed model JSON. .. py:attribute:: model :type: list[dict] The list of boosted trees (each a nested dict). .. py:attribute:: raw_input :type: Any The raw input used to construct this instance (path, bytes, or dict). .. py:attribute:: learning_rate :type: float | None :value: None .. py:attribute:: context_keys :type: list | None :value: None .. py:attribute:: _properties :type: dict[str, Any] .. py:method:: from_dict(data: dict, *, context_keys: list | None = None) -> ADMTreesModel :classmethod: Build from an already-parsed model dict. .. py:method:: from_file(path: str | pathlib.Path, *, context_keys: list | None = None) -> ADMTreesModel :classmethod: Load a model from a local JSON file (Prediction Studio "save model" output). .. py:method:: from_url(url: str, *, timeout: float = 30.0, context_keys: list | None = None) -> ADMTreesModel :classmethod: Load a model from a URL pointing at the JSON export. ``timeout`` is the per-request timeout in seconds (default 30). .. py:method:: from_datamart_blob(blob: str | bytes, *, context_keys: list | None = None) -> ADMTreesModel :classmethod: Load from a base64-encoded zlib-compressed datamart ``Modeldata`` blob. .. py:method:: _decode_trees() .. py:method:: _post_import_cleanup(decode: bool, *, context_keys: list | None = None) .. py:method:: _locate_boosters() -> list[dict] Find the boosters/trees list in the model JSON. Different Pega versions place the boosters at different paths; try them in order. .. py:method:: _safe_numeric_compare(left: float, operator: str, right: float) -> bool :staticmethod: Safely compare two numeric values without using ``eval()``. .. py:attribute:: _safe_eval_seen_errors :type: set[tuple[str, str]] .. py:attribute:: _safe_eval_lock :type: threading.Lock .. py:method:: _safe_condition_evaluate(value: Any, operator: str, comparison_set: set | float | str | frozenset) -> bool :staticmethod: Safely evaluate split conditions without using ``eval()``. Returns ``False`` on type-conversion errors after logging the first occurrence per (operator, error-type) pair at INFO level. Subsequent matching failures log at DEBUG only — we don't want per-row scoring to swamp the application logs, but the first failure for each error class is worth surfacing. .. py:property:: metrics :type: dict[str, Any] Compute CDH_ADM005-style diagnostic metrics for this model. Returns a flat dictionary of key/value pairs aligned with the CDH_ADM005 telemetry event specification. Metrics that cannot be computed from an exported model (e.g. saturation counts that require bin-level data) are omitted. See :meth:`metric_descriptions` for human-readable descriptions of every key. .. py:method:: metric_descriptions() -> dict[str, str] :staticmethod: Return a dictionary mapping metric names to human-readable descriptions. .. py:method:: _classify_predictor(name: str) -> str :staticmethod: Classify a predictor as 'ih', 'context_key', or 'other'. .. py:method:: _compute_metrics() -> dict[str, Any] Walk all trees once and assemble the metrics dictionary. .. py:attribute:: _ENCODER_PATHS :type: tuple[tuple[str, Ellipsis], Ellipsis] :value: (('model', 'inputsEncoder', 'encoders'), ('model', 'model', 'inputsEncoder', 'encoders')) .. py:method:: _get_encoder_info() -> dict[str, dict[str, Any]] | None Extract predictor metadata from the inputsEncoder if present. Returns ``None`` when no encoder metadata is available (e.g. for exported/decoded models). .. py:property:: predictors :type: dict[str, str] | None .. py:property:: tree_stats :type: polars.DataFrame .. py:property:: splits_per_tree :type: dict[int, list[str]] .. py:property:: gains_per_tree :type: dict[int, list[float]] .. py:property:: gains_per_split :type: polars.DataFrame .. py:property:: grouped_gains_per_split :type: polars.DataFrame .. py:property:: all_values_per_split :type: dict[str, set] .. py:property:: splits_per_variable_type :type: tuple[list[collections.Counter], list[float]] Per-tree counts of splits grouped by predictor category. Equivalent to calling :meth:`compute_categorization_over_time` with no arguments. .. py:method:: get_predictors() -> dict[str, str] | None Extract predictor names and types from model metadata. Tries explicit metadata first (``configuration.predictors`` then ``predictors``); falls back to inferring from tree splits when neither is present. .. py:method:: _infer_predictors_from_splits() -> dict[str, str] | None Infer predictor names + types by walking all tree splits. .. py:property:: _splits_and_gains :type: tuple[dict[int, list[str]], dict[int, list[float]], polars.DataFrame] Compute (splits_per_tree, gains_per_tree, gains_per_split) once. Backs the public ``splits_per_tree`` / ``gains_per_tree`` / ``gains_per_split`` properties via a single tree-walk per tree. Implemented as a ``cached_property`` rather than ``@lru_cache`` because lru_cache holds a strong reference to ``self`` and would leak the entire ADMTreesModel instance for the lifetime of the cache. Zero-gain splits are kept (with ``gains == 0.0``) in ``gains_per_split`` so the per-split DataFrame is always aligned with ``splits_per_tree``. ``gains_per_tree`` continues to keep only positive gains for backward compatibility. .. py:method:: get_grouped_gains_per_split() -> polars.DataFrame Gains per split, grouped by split string with helpful aggregates. .. py:method:: plot_splits_per_variable(subset: set | None = None, show: bool = True) Box-plot of gains per split for each variable. .. py:method:: get_tree_stats() -> polars.DataFrame Generate a dataframe with useful stats for each tree. .. py:method:: get_all_values_per_split() -> dict[str, set] All distinct split values seen for each predictor. .. py:method:: get_tree_representation(tree_number: int) -> dict[int, dict] Build a flat node-id-keyed representation of one tree. Walks ``self.model[tree_number]`` in pre-order (left subtree before right) and returns a dict keyed by 1-based node id. Each entry has ``score``; internal nodes additionally carry ``split``, ``gain``, ``left_child`` and ``right_child``; non-root nodes carry ``parent_node``. This replaces an earlier implementation that mutated three accumulator parameters and relied on a final ``del`` to drop a spurious trailing entry. .. py:method:: plot_tree(tree_number: int, highlighted: dict | list | None = None, show: bool = True) -> pydot.Graph Plot the chosen decision tree. .. py:method:: get_visited_nodes(treeID: int, x: dict, save_all: bool = False) -> tuple[list, float, list] Trace the path through one tree for the given feature values. .. py:method:: get_all_visited_nodes(x: dict) -> polars.DataFrame Score every tree against ``x`` and return per-tree visit info. .. py:method:: score(x: dict) -> float Compute the (sigmoid-normalised) propensity score for ``x``. Calls :meth:`get_visited_nodes` per tree and sums the resulting leaf scores; avoids building the full per-tree DataFrame that :meth:`get_all_visited_nodes` would produce. .. py:method:: plot_contribution_per_tree(x: dict, show: bool = True) Plot the per-tree contribution toward the final propensity. .. py:method:: predictor_categorization(x: str, context_keys: list | None = None) -> str Default predictor categorisation function. .. py:method:: compute_categorization_over_time(predictor_categorization: collections.abc.Callable | None = None, context_keys: list | None = None) -> tuple[list[collections.Counter], list[float]] Per-tree categorisation counts plus per-tree absolute scores. .. py:method:: plot_splits_per_variable_type(predictor_categorization: collections.abc.Callable | None = None, **kwargs) Stacked-area chart of categorised split counts per tree.