`pdstools.adm.ADMTrees`¶

Module Contents¶

Classes¶

`ADMTrees`
`ADMTreesModel`	Functions for ADM Gradient boosting
`MultiTrees`

class ADMTrees¶

static getMultiTrees(file: polars.DataFrame, n_threads=1, verbose=True, **kwargs)¶

Parameters:: file (polars.DataFrame)

class ADMTreesModel(file: str, **kwargs)¶

Functions for ADM Gradient boosting

ADM Gradient boosting models consist of multiple trees, which build upon each other in a ‘boosting’ fashion. This class provides some functions to extract data from these trees, such as the features on which the trees split, important values for these features, statistics about the trees, or visualising each individual tree.

Parameters:: file (str) – The input file as a json (see notes)

trees¶

Type:: Dict

properties¶

Type:: Dict

learning_rate¶

Type:: float

model¶

Type:: Dict

treeStats¶

Type:: Dict

splitsPerTree¶

Type:: Dict

gainsPerTree¶

Type:: Dict

gainsPerSplit¶

Type:: pl.DataFrame

groupedGainsPerSplit¶

Type:: Dict

predictors¶

Type:: Set

allValuesPerSplit¶

Type:: Dict

Notes

The input file is the extracted json file of the ‘save model’ action in Prediction Studio. The Datamart column ‘pyModelData’ also contains this information, but it is compressed and the values for each split is encoded. Using the ‘save model’ button, only that data is decompressed and decoded.

_read_model(file, **kwargs)¶

_decodeTrees()¶

_post_import_cleanup(decode, **kwargs)¶

_depth(d: Dict) → Dict¶

Calculates the depth of the tree, used in TreeStats.

Parameters:: d (Dict)
Return type:: Dict

predictors()¶

treeStats()¶

splitsPerTree()¶

gainsPerTree()¶

gainsPerSplit()¶

groupedGainsPerSplit()¶

allValuesPerSplit()¶

splitsPerVariableType(**kwargs)¶

parseSplitValues(value) → Tuple[str, str, str]¶

Parses the raw ‘split’ string into its three components.

Once the split is parsed, Python can use it to evaluate.

Parameters:

value (str) – The raw ‘split’ string
Returns – Tuple[str, str, str] The variable on which the split is done, The direction of the split (< or ‘in’) The value on which to split

Return type:

Tuple[str, str, str]

static parseSplitValuesWithSpaces(value) → Tuple[str, str, str]¶

Return type:: Tuple[str, str, str]

getPredictors() → Dict¶

Return type:: Dict

getGainsPerSplit() → Tuple[Dict, polars.DataFrame, dict]¶

Function to compute the gains of each split in each tree.

Return type:: Tuple[Dict, polars.DataFrame, dict]

getGroupedGainsPerSplit() → polars.DataFrame¶

Function to get the gains per split, grouped by split.

It adds some additional information, such as the possible values, the mean gains, and the number of times the split is performed.

Return type:: polars.DataFrame

getSplitsRecursively(tree: Dict, splits: List, gains: List) → Tuple[List, List]¶

Recursively finds splits and their gains for each node.

By Python’s mutatable list mechanic, the easiest way to achieve this is to explicitly supply the function with empty lists. Therefore, the ‘splits’ and ‘gains’ parameter expect empty lists when initially called.

Parameters:

tree (Dict)
splits (List)
gains (List)

Returns:

Tuple[List, List]
Each split, and its corresponding gain

Return type:

Tuple[List, List]

plotSplitsPerVariable(subset: Set | None = None, show=True)¶

Plots the splits for each variable in the tree.

Parameters:

subset (Optional[Set]) – Optional parameter to subset the variables to plot
show (bool) – Whether to display each plot

Return type:

plt.figure

getTreeStats() → polars.DataFrame¶

Generate a dataframe with useful stats for each tree

Return type:: polars.DataFrame

getAllValuesPerSplit() → Dict¶

Generate a dictionary with the possible values for each split

Return type:: Dict

getNodesRecursively(tree: Dict, nodelist: Dict, counter: Dict, childs: List) → Tuple[Dict, List]¶

Recursively walks through each node, used for tree representation.

Again, nodelist, counter and childs expects empty dict, dict and list parameters.

Parameters:

tree (Dict)
nodelist (Dict)
counter (Dict)
childs (List)

Returns:

Tuple[Dict, List]
The dictionary of nodes and the list of child nodes

Return type:

Tuple[Dict, List]

static _fillChildNodeIDs(nodeinfo: Dict, childs: Dict) → Dict¶

Utility function to add child info to nodes

Parameters:

nodeinfo (Dict)
childs (Dict)

Return type:

Dict

getTreeRepresentation(tree_number: int) → Dict¶

Generates a more usable tree representation.

In this tree representation, each node has an ID, and its attributes are the attributes, with parent and child nodes added as well.

Parameters:

tree_number (int) – The number of the tree, in order of the original json
returns (Dict)

Return type:

Dict

plotTree(tree_number: int, highlighted: Dict | List | None = None, show=True) → pydot.Graph¶

Plots the chosen decision tree.

Parameters:

tree_number (int) – The number of the tree to visualise
highlighted (Optional[Dict, List]) – Optional parameter to highlight nodes in green If a dictionary, it expects an ‘x’: i.e., features with their corresponding values. If a list, expects a list of node IDs for that tree.

Return type:

pydot.Graph

getVisitedNodes(treeID: int, x: Dict, save_all: bool = False) → Tuple[List, float, List]¶

Finds all visited nodes for a given tree, given an x

Parameters:

treeID (int) – The ID of the tree
x (Dict) – Features to split on, with their values
save_all (bool, default = False) – Whether to save all gains for each individual split

Returns:

The list of visited nodes, The score of the final leaf node, The gains for each split in the visited nodes

Return type:

List, float, List

getAllVisitedNodes(x: Dict) → polars.DataFrame¶

Loops through each tree, and records the scoring info

Parameters:: x (Dict) – Features to split on, with their values
Return type:: pl.DataFrame

score(x: Dict) → float¶

Computes the score for a given x

Parameters:: x (Dict)
Return type:: float

plotContributionPerTree(x: Dict, show=True)¶

Plots the contribution of each tree towards the final propensity.

Parameters:: x (Dict)

predictorCategorization(x: str, context_keys=None)¶

Parameters:: x (str)

computeCategorizationOverTime(predictorCategorization=None, context_keys=None)¶

plotSplitsPerVariableType(predictorCategorization=None, **kwargs)¶

class MultiTrees¶

property first¶

property last¶

trees: dict¶

model_name: str¶

context_keys: list¶

__repr__()¶: Return repr(self).

__getitem__(index)¶

__len__()¶

__add__(other)¶

computeOverTime(predictorCategorization=None)¶

plotSplitsPerVariableType(predictorCategorization=None, **kwargs)¶

pdstools.adm.ADMTrees¶

Module Contents¶

Classes¶

`pdstools.adm.ADMTrees`¶