pdstools.adm.Plots._predictors

Predictor-level performance, contribution, heatmap, and count plots.

Classes

_PredictorPlotsMixin

Common attribute surface used by every plot mixin.

Module Contents

class _PredictorPlotsMixin

Bases: pdstools.adm.Plots._base._PlotsBase

Common attribute surface used by every plot mixin.

_boxplot_pre_aggregated(df: polars.LazyFrame, *, y_col: str, metric_col: str, metric_weight_col: str | None = None, legend_col: str | None = None, color_discrete_map: dict[str, str] | None = None, return_df: bool = False)
Parameters:
  • df (polars.LazyFrame)

  • y_col (str)

  • metric_col (str)

  • metric_weight_col (str | None)

  • legend_col (str | None)

  • color_discrete_map (dict[str, str] | None)

  • return_df (bool)

predictor_performance(*, metric: str = 'Performance', top_n: int | None = None, active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)

Plots a box plot of the performance of the predictors

Use the query argument to drill down to a more specific subset If top n is given, chooses the top predictors based on the weighted average performance across models, ordered by their median performance.

Parameters:
  • metric (str, optional) – The metric to plot, by default “Performance” This is more for future-proofing, once FeatureImportance gets more used.

  • top_n (Optional[int], optional) – The top n predictors to plot, by default None

  • active_only (bool, optional) – Whether to only consider active predictor performance, by default False

  • query (Optional[QUERY], optional) – The query to apply to the data, by default None

  • return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False

Returns:

Plotly box plot figure, LazyFrame if return_df=True, or None if no data

Return type:

Figure | pl.LazyFrame | None

See also

pdstools.adm.ADMDatamart.apply_predictor_categorization

how to override the out of the box predictor categorization

Examples

>>> # Default: all predictors ranked by performance
>>> fig = dm.plot.predictor_performance()
>>> # Top-15 active predictors only
>>> fig = dm.plot.predictor_performance(top_n=15, active_only=True)
>>> # Filter to a specific channel and return the raw data
>>> df = dm.plot.predictor_performance(
...     query={"Channel": "Web"},
...     return_df=True,
... )
predictor_category_performance(*, metric: str = 'Performance', active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)

Plot the predictor category performance

Parameters:
  • metric (str, optional) – The metric to plot, by default “Performance”

  • active_only (bool, optional) – Whether to only analyze active predictors, by default False

  • query (Optional[QUERY], optional) – An optional query to apply, by default None

  • return_df (bool, optional) – An optional flag to get the dataframe instead, by default False

Returns:

A Plotly figure

Return type:

px.Figure

See also

pdstools.adm.ADMDatamart.apply_predictor_categorization

how to override the out of the box predictor categorization

Examples

>>> # Default: performance box plot per predictor category
>>> fig = dm.plot.predictor_category_performance()
>>> # Active predictors only, filtered to a specific channel
>>> fig = dm.plot.predictor_category_performance(
...     active_only=True,
...     query={"Channel": "Web"},
... )
>>> # Return underlying data for further analysis
>>> df = dm.plot.predictor_category_performance(return_df=True)
predictor_contribution(*, by: str = 'Configuration', query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)

Plots the predictor contribution for each configuration

Parameters:
  • by (str, optional) – By which column to plot the contribution, by default “Configuration”

  • query (Optional[QUERY], optional) – An optional query to apply to the data, by default None

  • return_df (bool, optional) – An optional flag to get a Dataframe instead, by default False

Returns:

A plotly figure

Return type:

px.Figure

See also

pdstools.adm.ADMDatamart.apply_predictor_categorization

how to override the out of the box predictor categorization

Examples

>>> # Default: contribution per Configuration
>>> fig = dm.plot.predictor_contribution()
>>> # Contribution grouped by Channel
>>> fig = dm.plot.predictor_contribution(by="Channel")
>>> # Return the contribution data for further processing
>>> df = dm.plot.predictor_contribution(return_df=True)
predictor_performance_heatmap(*, top_predictors: int = 20, top_groups: int | None = None, by: str = 'Name', active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)

Generate a heatmap showing predictor performance across different groups.

Parameters:
  • top_predictors (int, optional) – Number of top-performing predictors to include, by default 20

  • top_groups (int, optional) – Number of top groups to include, by default None (all groups)

  • by (str, optional) – Column to group by for the heatmap, by default “Name”

  • active_only (bool, optional) – Whether to only include active predictors, by default False

  • query (Optional[QUERY], optional) – Optional query to filter the data, by default None

  • return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False

Returns:

Plotly heatmap figure or DataFrame if return_df=True

Return type:

Union[Figure, pl.LazyFrame]

Examples

>>> # Default: top-20 predictors vs proposition (Name)
>>> fig = dm.plot.predictor_performance_heatmap()
>>> # Top-10 predictors across top-5 configurations, active only
>>> fig = dm.plot.predictor_performance_heatmap(
...     top_predictors=10,
...     top_groups=5,
...     by="Configuration",
...     active_only=True,
... )
>>> # Return the pivot data for further processing
>>> df = dm.plot.predictor_performance_heatmap(return_df=True)
predictor_count(*, by: str | list[str] = ['EntryType', 'Type'], query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)

Generate a box plot showing the distribution of predictor counts by type.

Parameters:
  • by (Union[str, list[str]], optional) – Column(s) to group predictors by, by default [“EntryType”, “Type”]

  • query (Optional[QUERY], optional) – Optional query to filter the data, by default None

  • return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False

Returns:

Plotly box plot figure or DataFrame if return_df=True

Return type:

Union[Figure, pl.LazyFrame]

Examples

>>> # Default: predictor count distribution by EntryType and Type
>>> fig = dm.plot.predictor_count()
>>> # Distribution by EntryType only
>>> fig = dm.plot.predictor_count(by="EntryType")
>>> # Return the raw counts
>>> df = dm.plot.predictor_count(return_df=True)