pdstools.adm.ADMTrees
=====================

.. py:module:: pdstools.adm.ADMTrees


Classes
-------

.. autoapisummary::

   pdstools.adm.ADMTrees.AGB
   pdstools.adm.ADMTrees.ADMTrees
   pdstools.adm.ADMTrees.ADMTreesModel
   pdstools.adm.ADMTrees.MultiTrees


Module Contents
---------------

.. py:class:: AGB(datamart: pdstools.adm.ADMDatamart.ADMDatamart)

   .. py:attribute:: datamart


   .. py:method:: discover_model_types(df: polars.LazyFrame, by: str = 'Configuration') -> Dict

      Discovers the type of model embedded in the pyModelData column.

      By default, we do a group_by Configuration, because a model rule can only
      contain one type of model. Then, for each configuration, we look into the
      pyModelData blob and find the _serialClass, returning it in a dict.

      :param df: The dataframe to search for model types
      :type df: pl.LazyFrame
      :param by: The column to look for types in. Configuration is recommended.
      :type by: str
      :param allow_collect: Set to True to allow discovering modelTypes, even if in lazy strategy.
                            It will fetch one modelData string per configuration.
      :type allow_collect: bool, default = False


   .. py:method:: get_agb_models(last: bool = False, by: str = 'Configuration', n_threads: int = 1, query: Optional[pdstools.utils.types.QUERY] = None, verbose: bool = True, **kwargs) -> ADMTrees

      Method to automatically extract AGB models.

      Recommended to subset using the querying functionality
      to cut down on execution time, because it checks for each
      model ID. If you only have AGB models remaining after the query,
      it will only return proper AGB models.

      :param last: Whether to only look at the last snapshot for each model
      :type last: bool, default = False
      :param by: Which column to determine unique models with
      :type by: str, default = 'Configuration'
      :param n_threads: The number of threads to use for extracting the models.
                        Since we use multithreading, setting this to a reasonable value
                        helps speed up the import.
      :type n_threads: int, default = 6
      :param query: Please refer to :meth:`._apply_query`
      :type query: Optional[Union[pl.Expr, List[pl.Expr], str, Dict[str, list]]]
      :param verbose: Whether to print out information while importing
      :type verbose: bool, default = False


.. py:class:: ADMTrees

   .. py:method:: get_multi_trees(file: polars.DataFrame, n_threads=1, verbose=True, **kwargs)
      :staticmethod:


.. py:class:: ADMTreesModel(file: str, **kwargs)

   Functions for ADM Gradient boosting

   ADM Gradient boosting models consist of multiple trees,
   which build upon each other in a 'boosting' fashion.
   This class provides some functions to extract data from
   these trees, such as the features on which the trees split,
   important values for these features, statistics about the trees,
   or visualising each individual tree.

   :param file: The input file as a json (see notes)
   :type file: str

   .. attribute:: trees

      
      :type: Dict

   .. attribute:: properties

      
      :type: Dict

   .. attribute:: learning_rate

      
      :type: float

   .. attribute:: model

      
      :type: Dict

   .. attribute:: treeStats

      
      :type: Dict

   .. attribute:: splitsPerTree

      
      :type: Dict

   .. attribute:: gainsPerTree

      
      :type: Dict

   .. attribute:: gainsPerSplit

      
      :type: pl.DataFrame

   .. attribute:: groupedGainsPerSplit

      
      :type: Dict

   .. attribute:: predictors

      
      :type: Set

   .. attribute:: allValuesPerSplit

      
      :type: Dict

   .. rubric:: Notes

   The input file is the extracted json file of the 'save model'
   action in Prediction Studio. The Datamart column 'pyModelData'
   also contains this information, but it is compressed and
   the values for each split is encoded. Using the 'save model'
   button, only that data is decompressed and decoded.


   .. py:attribute:: nospaces
      :value: True


   .. py:method:: _read_model(file, **kwargs)


   .. py:method:: _decode_trees()


   .. py:method:: _post_import_cleanup(decode, **kwargs)


   .. py:method:: _depth(d: Dict) -> int

      Calculates the depth of the tree, used in TreeStats.


   .. py:property:: predictors


   .. py:property:: tree_stats


   .. py:property:: splits_per_tree


   .. py:property:: gains_per_tree


   .. py:property:: gains_per_split


   .. py:property:: grouped_gains_per_split


   .. py:property:: all_values_per_split


   .. py:property:: splits_per_variable_type


   .. py:method:: parse_split_values(value) -> Tuple[str, str, str]

      Parses the raw 'split' string into its three components.

      Once the split is parsed, Python can use it to evaluate.

      :param value: The raw 'split' string
      :type value: str
      :param Returns: Tuple[str, str, str]
                      The variable on which the split is done,
                      The direction of the split (< or 'in')
                      The value on which to split


   .. py:method:: parse_split_values_with_spaces(value) -> Tuple[str, str, str]
      :staticmethod:


   .. py:method:: get_predictors() -> Optional[Dict]


   .. py:method:: get_gains_per_split() -> Tuple[Dict, Dict, polars.DataFrame]

      Function to compute the gains of each split in each tree.


   .. py:method:: get_grouped_gains_per_split() -> polars.DataFrame

      Function to get the gains per split, grouped by split.

      It adds some additional information, such as the possible values,
      the mean gains, and the number of times the split is performed.


   .. py:method:: get_splits_recursively(tree: Dict, splits: List, gains: List) -> Tuple[List, List]

      Recursively finds splits and their gains for each node.

      By Python's mutatable list mechanic, the easiest way to achieve
      this is to explicitly supply the function with empty lists.
      Therefore, the 'splits' and 'gains' parameter expect
      empty lists when initially called.

      :param tree:
      :type tree: Dict
      :param splits:
      :type splits: List
      :param gains:
      :type gains: List

      :returns: * *Tuple[List, List]*
                * *Each split, and its corresponding gain*


   .. py:method:: plot_splits_per_variable(subset: Optional[Set] = None, show=True)

      Plots the splits for each variable in the tree.

      :param subset: Optional parameter to subset the variables to plot
      :type subset: Optional[Set]
      :param show: Whether to display each plot
      :type show: bool

      :rtype: plt.figure


   .. py:method:: get_tree_stats() -> polars.DataFrame

      Generate a dataframe with useful stats for each tree


   .. py:method:: get_all_values_per_split() -> Dict

      Generate a dictionary with the possible values for each split


   .. py:method:: get_nodes_recursively(tree: Dict, nodelist: Dict, counter: List, childs: Dict) -> Tuple[Dict, Dict]

      Recursively walks through each node, used for tree representation.

      Again, nodelist, counter and childs expects
      empty dict, dict and list parameters.

      :param tree:
      :type tree: Dict
      :param nodelist:
      :type nodelist: Dict
      :param counter:
      :type counter: Dict
      :param childs:
      :type childs: List

      :returns: * *Tuple[Dict, List]*
                * *The dictionary of nodes and the list of child nodes*


   .. py:method:: _fill_child_node_ids(nodeinfo: Dict, childs: Dict) -> Dict
      :staticmethod:


      Utility function to add child info to nodes


   .. py:method:: get_tree_representation(tree_number: int) -> Dict

      Generates a more usable tree representation.

      In this tree representation, each node has an ID,
      and its attributes are the attributes,
      with parent and child nodes added as well.

      :param tree_number: The number of the tree, in order of the original json
      :type tree_number: int
      :param returns:
      :type returns: Dict


   .. py:method:: plot_tree(tree_number: int, highlighted: Optional[Union[Dict, List]] = None, show=True) -> pydot.Graph

      Plots the chosen decision tree.

      :param tree_number: The number of the tree to visualise
      :type tree_number: int
      :param highlighted: Optional parameter to highlight nodes in green
                          If a dictionary, it expects an 'x': i.e., features
                          with their corresponding values.
                          If a list, expects a list of node IDs for that tree.
      :type highlighted: Optional[Dict, List]

      :rtype: pydot.Graph


   .. py:method:: get_visited_nodes(treeID: int, x: Dict, save_all: bool = False) -> Tuple[List, float, List]

      Finds all visited nodes for a given tree, given an x

      :param treeID: The ID of the tree
      :type treeID: int
      :param x: Features to split on, with their values
      :type x: Dict
      :param save_all: Whether to save all gains for each individual split
      :type save_all: bool, default = False

      :returns: The list of visited nodes,
                The score of the final leaf node,
                The gains for each split in the visited nodes
      :rtype: List, float, List


   .. py:method:: get_all_visited_nodes(x: Dict) -> polars.DataFrame

      Loops through each tree, and records the scoring info

      :param x: Features to split on, with their values
      :type x: Dict

      :rtype: pl.DataFrame


   .. py:method:: score(x: Dict) -> float

      Computes the score for a given x


   .. py:method:: plot_contribution_per_tree(x: Dict, show=True)

      Plots the contribution of each tree towards the final propensity.


   .. py:method:: predictor_categorization(x: str, context_keys=None)


   .. py:method:: compute_categorization_over_time(predictorCategorization=None, context_keys=None)


   .. py:method:: plot_splits_per_variable_type(predictor_categorization=None, **kwargs)


.. py:class:: MultiTrees

   .. py:attribute:: trees
      :type:  dict


   .. py:attribute:: model_name
      :type:  Optional[str]
      :value: None


   .. py:attribute:: context_keys
      :type:  Optional[list]
      :value: None


   .. py:method:: __repr__()


   .. py:method:: __getitem__(index)


   .. py:method:: __len__()


   .. py:method:: __add__(other)


   .. py:property:: first


   .. py:property:: last


   .. py:method:: compute_over_time(predictor_categorization=None)