pdstools.adm.BinAggregator¶
Classes¶
A class to generate rolled up insights from ADM predictor binning. |
Module Contents¶
- class BinAggregator(dm: pdstools.adm.ADMDatamart.ADMDatamart)¶
Bases:
pdstools.utils.namespaces.LazyNamespace
A class to generate rolled up insights from ADM predictor binning.
- Parameters:
- dependencies = ['plotly', 'numpy']¶
- roll_up(predictors: str | list, *, n: int = 10, distribution: Literal['lin', 'log'] = 'lin', boundaries: float | list | None = None, symbols: str | list | None = None, minimum: float | None = None, maximum: float | None = None, aggregation: str | None = None, as_numeric: bool | None = None, return_df: bool = False, verbose: bool = False) polars.DataFrame | Figure ¶
Roll up a predictor across all the models defined when creating the class.
Predictors can be both numeric and symbolic (also called ‘categorical’). You can aggregate the same predictor across different sets of models by specifying a column name in the aggregation argument.
- Parameters:
predictors (str | list) – Name of the predictor to roll up. Multiple predictors can be passed in as a list.
n (int, optional) – Number of bins (intervals or symbols) to generate, by default 10. Any custom intervals or symbols specified with the ‘musthave’ argument will count towards this number as well. For symbolic predictors can be None, which means unlimited.
distribution (str, optional) – For numeric predictors: the way the intervals are constructed. By default “lin” for an evenly-spaced distribution, can be set to “log” for a long tailed distribution (for fields like income).
boundaries (float | list, optional) – For numeric predictors: one value, or a list of the numeric values to include as interval boundaries. They will be used at the front of the automatically created intervals. By default None, all intervals are created automatically.
symbols (str | list, optional) – For symbolic predictors, any symbol(s) that must be included in the symbol list in the generated binning. By default None.
minimum (float, optional) – Minimum value for numeric predictors, by default None. When None the minimum is taken from the binning data of the models.
maximum (float, optional) – Maximum value for numeric predictors, by default None. When None the maximum is taken from the binning data of the models.
aggregation (str, optional) – Optional column name in the data to aggregate over, creating separate aggregations for each of the different values. By default None.
as_numeric (bool, optional) – Optional override for the type of the predictor, so to be able to override in the (exceptional) situation that a predictor with the same name is numeric in some and symbolic in some other models. By default None which means the type is taken from the first predictor in the data.
return_df (bool, optional) – Return the underlying binning instead of a plot.
verbose (bool, optional) – Show detailed debug information while executing, by default False
- Returns:
By default returns a nicely formatted plot. When ‘return_df’ is set to True, it returns the actual binning with the lift aggregated over all the models, optionally per predictor and per set of models.
- Return type:
pl.DataFrame | Figure
- accumulate_num_binnings(predictor, modelids, target_binning, verbose=False) polars.DataFrame ¶
- Return type:
polars.DataFrame
- accumulate_sym_binnings(predictor, modelids, symbollist, verbose=False) polars.DataFrame ¶
- Return type:
polars.DataFrame
- normalize_all_binnings(combined_dm: polars.LazyFrame) polars.LazyFrame ¶
Prepare all predictor binning
Fix up the boundaries for numeric bins and parse the bin labels into clean lists for symbolics.
- Parameters:
combined_dm (polars.LazyFrame)
- Return type:
polars.LazyFrame
- create_empty_numbinning(predictor: str, n: int, distribution: str = 'lin', boundaries: list | None = None, minimum: float | None = None, maximum: float | None = None) polars.DataFrame ¶
- combine_two_numbinnings(source: polars.DataFrame, target: polars.DataFrame, verbose=False) polars.DataFrame ¶
- Parameters:
source (polars.DataFrame)
target (polars.DataFrame)
- Return type:
polars.DataFrame
- plot_binning_attribution(source: polars.DataFrame, target: polars.DataFrame) Figure ¶
- Parameters:
source (polars.DataFrame)
target (polars.DataFrame)
- Return type:
Figure
- plot_binning_lift(binning, col_facet=None, row_facet=None, custom_data=['PredictorName', 'BinSymbol'], return_df=False) polars.DataFrame | Figure ¶
- Return type:
Union[polars.DataFrame, Figure]
- plot_lift_binning(binning: polars.DataFrame) Figure ¶
- Parameters:
binning (polars.DataFrame)
- Return type:
Figure