pdstools.adm.ADMTrees¶
Classes¶
Functions for ADM Gradient boosting |
|
Module Contents¶
- class AGB(datamart: pdstools.adm.ADMDatamart.ADMDatamart)¶
- Parameters:
datamart (pdstools.adm.ADMDatamart.ADMDatamart)
- datamart¶
- discover_model_types(df: polars.LazyFrame, by: str = 'Configuration') dict¶
Discovers the type of model embedded in the pyModelData column.
By default, we do a group_by Configuration, because a model rule can only contain one type of model. Then, for each configuration, we look into the pyModelData blob and find the _serialClass, returning it in a dict.
- Parameters:
- Return type:
- get_agb_models(last: bool = False, by: str = 'Configuration', n_threads: int = 1, query: pdstools.utils.types.QUERY | None = None, verbose: bool = True, **kwargs) ADMTrees¶
Method to automatically extract AGB models.
Recommended to subset using the querying functionality to cut down on execution time, because it checks for each model ID. If you only have AGB models remaining after the query, it will only return proper AGB models.
- Parameters:
last (bool, default = False) – Whether to only look at the last snapshot for each model
by (str, default = 'Configuration') – Which column to determine unique models with
n_threads (int, default = 6) – The number of threads to use for extracting the models. Since we use multithreading, setting this to a reasonable value helps speed up the import.
query (Optional[Union[pl.Expr, list[pl.Expr], str, dict[str, list]]]) – Please refer to
_apply_query()verbose (bool, default = False) – Whether to print out information while importing
- Return type:
- class ADMTrees¶
- static get_multi_trees(file: polars.DataFrame, n_threads=1, verbose=True, **kwargs)¶
- Parameters:
file (polars.DataFrame)
- class ADMTreesModel(file: str, **kwargs)¶
Functions for ADM Gradient boosting
ADM Gradient boosting models consist of multiple trees, which build upon each other in a ‘boosting’ fashion. This class provides some functions to extract data from these trees, such as the features on which the trees split, important values for these features, statistics about the trees, or visualising each individual tree.
- Parameters:
file (str) – The input file as a json (see notes)
- gainsPerSplit¶
- Type:
pl.DataFrame
Notes
The input file is the extracted json file of the ‘save model’ action in Prediction Studio. The Datamart column ‘pyModelData’ also contains this information, but it is compressed and the values for each split is encoded. Using the ‘save model’ button, only that data is decompressed and decoded.
- nospaces = True¶
- property metrics: dict[str, Any]¶
Compute CDH_ADM005-style diagnostic metrics for this model.
Returns a flat dictionary of key/value pairs aligned with the CDH_ADM005 telemetry event specification. Metrics that cannot be computed from an exported model (e.g. saturation counts that require bin-level data) are omitted.
See also
Pega
- static metric_descriptions() dict[str, str]¶
Return a dictionary mapping metric names to human-readable descriptions.
These descriptions document every metric returned by the
metricsproperty. They can be used programmatically to annotate reports or plots.
- property predictors¶
- property tree_stats¶
- property splits_per_tree¶
- property gains_per_tree¶
- property gains_per_split¶
- property grouped_gains_per_split¶
- property all_values_per_split¶
- property splits_per_variable_type¶
- parse_split_values(value) tuple[str, str, str]¶
Parses the raw ‘split’ string into its three components.
Once the split is parsed, Python can use it to evaluate.
- get_predictors() dict | None¶
Extract predictor names and types from model metadata.
Tries to find predictor metadata from the
configurationsection of the JSON. Models exported via the Prediction Studio “Save Model” button include aconfigurationkey with an explicit predictor list. However, models exported in the newer format (e.g. via automated pipelines or newer Pega versions) may omit theconfigurationsection entirely, containing onlytype,modelVersion,algorithm,trainingStats,auc, etc. at the top level. In that case, predictor names and types are inferred from the tree split nodes instead.- Return type:
dict | None
- get_gains_per_split() tuple[dict, dict, polars.DataFrame]¶
Function to compute the gains of each split in each tree.
- get_grouped_gains_per_split() polars.DataFrame¶
Function to get the gains per split, grouped by split.
It adds some additional information, such as the possible values, the mean gains, and the number of times the split is performed.
- Return type:
polars.DataFrame
- get_splits_recursively(tree: dict, splits: list, gains: list) tuple[list, list]¶
Recursively finds splits and their gains for each node.
By Python’s mutatable list mechanic, the easiest way to achieve this is to explicitly supply the function with empty lists. Therefore, the ‘splits’ and ‘gains’ parameter expect empty lists when initially called.
- plot_splits_per_variable(subset: set | None = None, show=True)¶
Plots the splits for each variable in the tree.
- get_tree_stats() polars.DataFrame¶
Generate a dataframe with useful stats for each tree
- Return type:
polars.DataFrame
- get_all_values_per_split() dict¶
Generate a dictionary with the possible values for each split
- Return type:
- get_nodes_recursively(tree: dict, nodelist: dict, counter: list, childs: dict) tuple[dict, dict]¶
Recursively walks through each node, used for tree representation.
Again, nodelist, counter and childs expects empty dict, dict and list parameters.
- get_tree_representation(tree_number: int) dict¶
Generates a more usable tree representation.
In this tree representation, each node has an ID, and its attributes are the attributes, with parent and child nodes added as well.
- plot_tree(tree_number: int, highlighted: dict | list | None = None, show=True) pydot.Graph¶
Plots the chosen decision tree.
- Parameters:
- Return type:
pydot.Graph
- get_visited_nodes(treeID: int, x: dict, save_all: bool = False) tuple[list, float, list]¶
Finds all visited nodes for a given tree, given an x
- Parameters:
- Returns:
The list of visited nodes, The score of the final leaf node, The gains for each split in the visited nodes
- Return type:
- get_all_visited_nodes(x: dict) polars.DataFrame¶
Loops through each tree, and records the scoring info
- Parameters:
x (dict) – Features to split on, with their values
- Return type:
pl.DataFrame
- plot_contribution_per_tree(x: dict, show=True)¶
Plots the contribution of each tree towards the final propensity.
- Parameters:
x (dict)
- compute_categorization_over_time(predictorCategorization=None, context_keys=None)¶
- plot_splits_per_variable_type(predictor_categorization=None, **kwargs)¶