pdstools.adm.Plots._predictors¶

Predictor-level performance, contribution, heatmap, and count plots.

Classes¶

_PredictorPlotsMixin

Common attribute surface used by every plot mixin.

Module Contents¶

class _PredictorPlotsMixin¶

Bases: pdstools.adm.Plots._base._PlotsBase

Common attribute surface used by every plot mixin.

_boxplot_pre_aggregated(df: polars.LazyFrame, *, y_col: str, metric_col: str, metric_weight_col: str | None = None, legend_col: str | None = None, color_discrete_map: dict[str, str] | None = None, return_df: bool = False)¶

Parameters:

df (polars.LazyFrame)
y_col (str)
metric_col (str)
metric_weight_col (str | None)
legend_col (str | None)
color_discrete_map (dict[str, str] | None)
return_df (bool)

predictor_performance(*, metric: str = 'Performance', top_n: int | None = None, active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶

Plots a box plot of the performance of the predictors

Use the query argument to drill down to a more specific subset If top n is given, chooses the top predictors based on the weighted average performance across models, ordered by their median performance.

Parameters:

metric (str, optional) – The metric to plot, by default “Performance” This is more for future-proofing, once FeatureImportance gets more used.
top_n (Optional[int], optional) – The top n predictors to plot, by default None
active_only (bool, optional) – Whether to only consider active predictor performance, by default False
query (Optional[QUERY], optional) – The query to apply to the data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False

Returns:

Plotly box plot figure, LazyFrame if return_df=True, or None if no data

Return type:

Figure | pl.LazyFrame | None

See also

pdstools.adm.ADMDatamart.apply_predictor_categorization: how to override the out of the box predictor categorization

Examples

>>> # Default: contribution per Configuration
>>> fig = dm.plot.predictor_contribution()

>>> # Contribution grouped by Channel
>>> fig = dm.plot.predictor_contribution(by="Channel")

>>> # Return the contribution data for further processing
>>> df = dm.plot.predictor_contribution(return_df=True)

predictor_performance_heatmap(*, top_predictors: int = 20, top_groups: int | None = None, by: str = 'Name', active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶

Generate a heatmap showing predictor performance across different groups.

Parameters:

top_predictors (int, optional) – Number of top-performing predictors to include, by default 20
top_groups (int, optional) – Number of top groups to include, by default None (all groups)
by (str, optional) – Column to group by for the heatmap, by default “Name”
active_only (bool, optional) – Whether to only include active predictors, by default False
query (Optional[QUERY], optional) – Optional query to filter the data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False

Returns:

Plotly heatmap figure or DataFrame if return_df=True

Return type:

Union[Figure, pl.LazyFrame]

Examples

>>> # Default: top-20 predictors vs proposition (Name)
>>> fig = dm.plot.predictor_performance_heatmap()

>>> # Top-10 predictors across top-5 configurations, active only
>>> fig = dm.plot.predictor_performance_heatmap(
...     top_predictors=10,
...     top_groups=5,
...     by="Configuration",
...     active_only=True,
... )

>>> # Return the pivot data for further processing
>>> df = dm.plot.predictor_performance_heatmap(return_df=True)

predictor_count(*, by: str | list[str] = ['EntryType', 'Type'], query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶

Generate a box plot showing the distribution of predictor counts by type.

Parameters:

by (Union[str, list[str]], optional) – Column(s) to group predictors by, by default [“EntryType”, “Type”]
query (Optional[QUERY], optional) – Optional query to filter the data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False

Returns:

Plotly box plot figure or DataFrame if return_df=True

Return type:

Union[Figure, pl.LazyFrame]

Examples

>>> # Default: predictor count distribution by EntryType and Type
>>> fig = dm.plot.predictor_count()

>>> # Distribution by EntryType only
>>> fig = dm.plot.predictor_count(by="EntryType")

>>> # Return the raw counts
>>> df = dm.plot.predictor_count(return_df=True)