pdstools.adm.BinAggregator

Classes

BinAggregator

A class to generate rolled up insights from ADM predictor binning.

Module Contents

class BinAggregator(dm: pdstools.adm.ADMDatamart.ADMDatamart)

Bases: pdstools.utils.namespaces.LazyNamespace

A class to generate rolled up insights from ADM predictor binning.

Parameters:

dm (pdstools.adm.ADMDatamart.ADMDatamart)

dependencies = ['plotly', 'numpy']
roll_up(predictors: str | list, *, n: int = 10, distribution: Literal['lin', 'log'] = 'lin', boundaries: float | list | None = None, symbols: str | list | None = None, minimum: float | None = None, maximum: float | None = None, aggregation: str | None = None, as_numeric: bool | None = None, return_df: bool = False, verbose: bool = False) polars.DataFrame | Figure

Roll up a predictor across all the models defined when creating the class.

Predictors can be both numeric and symbolic (also called ‘categorical’). You can aggregate the same predictor across different sets of models by specifying a column name in the aggregation argument.

Parameters:
  • predictors (str | list) – Name of the predictor to roll up. Multiple predictors can be passed in as a list.

  • n (int, optional) – Number of bins (intervals or symbols) to generate, by default 10. Any custom intervals or symbols specified with the ‘musthave’ argument will count towards this number as well. For symbolic predictors can be None, which means unlimited.

  • distribution (str, optional) – For numeric predictors: the way the intervals are constructed. By default “lin” for an evenly-spaced distribution, can be set to “log” for a long tailed distribution (for fields like income).

  • boundaries (float | list, optional) – For numeric predictors: one value, or a list of the numeric values to include as interval boundaries. They will be used at the front of the automatically created intervals. By default None, all intervals are created automatically.

  • symbols (str | list, optional) – For symbolic predictors, any symbol(s) that must be included in the symbol list in the generated binning. By default None.

  • minimum (float, optional) – Minimum value for numeric predictors, by default None. When None the minimum is taken from the binning data of the models.

  • maximum (float, optional) – Maximum value for numeric predictors, by default None. When None the maximum is taken from the binning data of the models.

  • aggregation (str, optional) – Optional column name in the data to aggregate over, creating separate aggregations for each of the different values. By default None.

  • as_numeric (bool, optional) – Optional override for the type of the predictor, so to be able to override in the (exceptional) situation that a predictor with the same name is numeric in some and symbolic in some other models. By default None which means the type is taken from the first predictor in the data.

  • return_df (bool, optional) – Return the underlying binning instead of a plot.

  • verbose (bool, optional) – Show detailed debug information while executing, by default False

Returns:

By default returns a nicely formatted plot. When ‘return_df’ is set to True, it returns the actual binning with the lift aggregated over all the models, optionally per predictor and per set of models.

Return type:

pl.DataFrame | Figure

accumulate_num_binnings(predictor, modelids, target_binning, verbose=False) polars.DataFrame
Return type:

polars.DataFrame

create_symbol_list(predictor, n_symbols, musthave_symbols) list
Return type:

list

accumulate_sym_binnings(predictor, modelids, symbollist, verbose=False) polars.DataFrame
Return type:

polars.DataFrame

normalize_all_binnings(combined_dm: polars.LazyFrame) polars.LazyFrame

Prepare all predictor binning

Fix up the boundaries for numeric bins and parse the bin labels into clean lists for symbolics.

Parameters:

combined_dm (polars.LazyFrame)

Return type:

polars.LazyFrame

create_empty_numbinning(predictor: str, n: int, distribution: str = 'lin', boundaries: list | None = None, minimum: float | None = None, maximum: float | None = None) polars.DataFrame
Parameters:
  • predictor (str)

  • n (int)

  • distribution (str)

  • boundaries (Optional[list])

  • minimum (Optional[float])

  • maximum (Optional[float])

Return type:

polars.DataFrame

get_source_numbinning(predictor: str, modelid: str) polars.DataFrame
Parameters:
  • predictor (str)

  • modelid (str)

Return type:

polars.DataFrame

combine_two_numbinnings(source: polars.DataFrame, target: polars.DataFrame, verbose=False) polars.DataFrame
Parameters:
  • source (polars.DataFrame)

  • target (polars.DataFrame)

Return type:

polars.DataFrame

plot_binning_attribution(source: polars.DataFrame, target: polars.DataFrame) Figure
Parameters:
  • source (polars.DataFrame)

  • target (polars.DataFrame)

Return type:

Figure

plot_binning_lift(binning, col_facet=None, row_facet=None, custom_data=['PredictorName', 'BinSymbol'], return_df=False) polars.DataFrame | Figure
Return type:

Union[polars.DataFrame, Figure]

plot_lift_binning(binning: polars.DataFrame) Figure
Parameters:

binning (polars.DataFrame)

Return type:

Figure