pdstools.adm.Plots._performance¶
Performance / volume time-series plots and proposition success rates.
Attributes¶
Classes¶
Common attribute surface used by every plot mixin. |
Module Contents¶
- logger¶
- class _PerformancePlotsMixin¶
Bases:
pdstools.adm.Plots._base._PlotsBaseCommon attribute surface used by every plot mixin.
- over_time(metric: str = 'Performance', by: polars.Expr | str | list[str] = 'ModelID', *, every: str | datetime.timedelta = '1d', cumulative: bool = True, query: pdstools.utils.types.QUERY | None = None, facet: str | None = None, show_metric_limits: bool = False, return_df: bool = False)¶
Statistics over time
- Parameters:
metric (str, optional) – The metric to plot, by default “Performance”
by (Union[pl.Expr, str, list[str]], optional) – The column(s) to group by, by default “ModelID”. When a list of column names is passed, the values are concatenated with “ / “ into a single combined dimension that is encoded as colour. To keep the chart readable, the top 10 combinations by total
metricare kept and the rest are dropped with a warning.every (Union[str, timedelta], optional) – By what time period to group, by default “1d”, see https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html for periods.
cumulative (bool, optional) – Whether to show cumulative values or period-over-period changes, by default True
query (Optional[QUERY], optional) – The query to apply to the data, by default None
facet (Optional[str], optional) – Whether to facet the plot into subplots, by default None
show_metric_limits (bool, optional) – Whether to show dashed horizontal lines at the metric limit thresholds (from MetricLimits.csv), by default False. Only applies when metric is “Performance”.
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
- Returns:
Plotly line chart or LazyFrame if return_df=True
- Return type:
Figure | pl.LazyFrame
Examples
>>> # Default: performance over time, one line per model >>> fig = dm.plot.over_time()
>>> # SuccessRate over time, grouped by Channel >>> fig = dm.plot.over_time(metric="SuccessRate", by="Channel")
>>> # Group by multiple dimensions at once (combined into a single >>> # colour-encoded series) >>> fig = dm.plot.over_time(by=["Channel", "Direction"])
>>> # Period-over-period response-count changes, faceted by Direction >>> fig = dm.plot.over_time( ... metric="ResponseCount", ... cumulative=False, ... facet="Direction", ... )
>>> # Return the data instead of the figure >>> df = dm.plot.over_time(return_df=True)
- proposition_success_rates(metric: str = 'SuccessRate', by: str = 'Name', *, top_n: int = 0, query: pdstools.utils.types.QUERY | None = None, facet: str | None = None, return_df: bool = False)¶
Proposition Success Rates
- Parameters:
metric (str, optional) – The metric to plot, by default “SuccessRate”
by (str, optional) – By which column to group the, by default “Name”
top_n (int, optional) – Whether to take a top_n on the by column, by default 0
query (Optional[QUERY], optional) – A query to apply to the data, by default None
facet (Optional[str], optional) – What facetting column to apply to the graph, by default None
return_df (bool, optional) – Whether to return a DataFrame instead of the graph, by default False
- Returns:
Plotly histogram figure or LazyFrame if return_df=True
- Return type:
Figure | pl.LazyFrame
Examples
>>> # Default: average success rate per proposition >>> fig = dm.plot.proposition_success_rates()
>>> # Top-10 propositions by success rate, faceted by Channel >>> fig = dm.plot.proposition_success_rates(top_n=10, facet="Channel")
>>> # Use ResponseCount as the metric grouped by Configuration >>> fig = dm.plot.proposition_success_rates( ... metric="ResponseCount", ... by="Configuration", ... )
- gains_chart(value: str, *, index: str | None = None, by: str | list[str] | None = None, query: pdstools.utils.types.QUERY | None = None, title: str | None = None, return_df: bool = False) pdstools.utils.plot_utils.Figure | polars.LazyFrame¶
Generate a gains chart showing cumulative distribution of a metric.
Creates a gains/lift chart to visualize model response skewness. Shows what percentage of the total value (e.g., responses, positives) is driven by what percentage of models. Useful for identifying if a small number of models drive most of the volume.
- Parameters:
value (str) – Column name containing the metric to compute gains for (e.g., “ResponseCount”, “Positives”)
index (str, optional) – Column name to normalize by (e.g., population size). If None, uses model count.
by (str | list[str], optional) – Column(s) to group by for separate gain curves (e.g., “Channel” or [“Channel”, “Direction”])
query (QUERY, optional) – Optional query to filter the data before computing gains
title (str, optional) – Chart title. If None, uses “Gains Chart”
return_df (bool, default False) – If True, return the gains data instead of the figure
- Returns:
Plotly figure showing the gains chart, or LazyFrame if return_df=True
- Return type:
Figure | pl.LazyFrame
Examples
>>> # Single gains curve for response count >>> fig = datamart.plot.gains_chart(value="ResponseCount")
>>> # Gains curves by channel for positives >>> fig = datamart.plot.gains_chart( ... value="Positives", ... by=["Channel", "Direction"], ... title="Cumulative Positives by Channel" ... )
- performance_volume_distribution(*, by: str | list[str] | None = None, query: pdstools.utils.types.QUERY | None = None, bin_width: int = 3, title: str | None = None, return_df: bool = False) pdstools.utils.plot_utils.Figure | polars.LazyFrame¶
Generate a performance vs volume distribution chart.
Shows how response volume is distributed across different model performance ranges. Helps identify if volume is driven by high-performing or low-performing models. Ideally, most volume should be in the 60-80 AUC range.
- Parameters:
by (str | list[str], optional) – Column(s) to group by for separate curves (e.g., “Channel” or [“Channel”, “Direction”]) If None, creates a single curve for all data
query (QUERY, optional) – Optional query to filter the data before analysis
bin_width (int, default 3) – Width of performance bins in AUC points (default creates bins of 3: 50-53, 53-56, etc.)
title (str, optional) – Chart title. If None, uses “Performance vs Volume”
return_df (bool, default False) – If True, return the binned data instead of the figure
- Returns:
Plotly figure showing performance distribution, or LazyFrame if return_df=True
- Return type:
Figure | pl.LazyFrame
Notes
Performance is binned from 50-100 using the specified bin_width. The chart shows what percentage of responses fall into each performance bin, grouped by the by parameter if provided.
Examples
>>> # Single curve for all channels >>> fig = datamart.plot.performance_volume_distribution()
>>> # Separate curves per channel >>> fig = datamart.plot.performance_volume_distribution( ... by=["Channel", "Direction"], ... title="Performance Distribution by Channel" ... )