pdstools.adm.Plots¶
Classes¶
Module Contents¶
- class Plots(datamart: pdstools.adm.ADMDatamart.ADMDatamart)¶
Bases:
pdstools.utils.namespaces.LazyNamespace- Parameters:
datamart (pdstools.adm.ADMDatamart.ADMDatamart)
- dependencies = ['plotly']¶
- dependency_group = 'adm'¶
- datamart¶
- bubble_chart(*, last: bool = True, rounding: int = 5, query: pdstools.utils.types.QUERY | None = None, facet: str | polars.Expr | None = None, color: str | None = 'Performance', return_df: bool = False)¶
The Bubble Chart, as seen in Prediction Studio
- Parameters:
last (bool, optional) – Whether to only include the latest snapshot, by default True
rounding (int, optional) – To how many digits to round the performance number
query (Optional[QUERY], optional) – The query to apply to the data, by default None
facet (Optional[Union[str, pl.Expr]], optional) – Column name or Polars expression to facet the plot into subplots, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
color (str | None)
- over_time(metric: str = 'Performance', by: polars.Expr | str = 'ModelID', *, every: str | datetime.timedelta = '1d', cumulative: bool = True, query: pdstools.utils.types.QUERY | None = None, facet: str | None = None, return_df: bool = False)¶
Statistics over time
- Parameters:
metric (str, optional) – The metric to plot, by default “Performance”
by (Union[pl.Expr, str], optional) – The column to group by, by default “ModelID”
every (Union[str, timedelta], optional) – By what time period to group, by default “1d”, see https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html for periods.
cumulative (bool, optional) – Whether to show cumulative values or period-over-period changes, by default True
query (Optional[QUERY], optional) – The query to apply to the data, by default None
facet (Optional[str], optional) – Whether to facet the plot into subplots, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
- proposition_success_rates(metric: str = 'SuccessRate', by: str = 'Name', *, top_n: int = 0, query: pdstools.utils.types.QUERY | None = None, facet: str | None = None, return_df: bool = False)¶
Proposition Success Rates
- Parameters:
metric (str, optional) – The metric to plot, by default “SuccessRate”
by (str, optional) – By which column to group the, by default “Name”
top_n (int, optional) – Whether to take a top_n on the by column, by default 0
query (Optional[QUERY], optional) – A query to apply to the data, by default None
facet (Optional[str], optional) – What facetting column to apply to the graph, by default None
return_df (bool, optional) – Whether to return a DataFrame instead of the graph, by default False
- score_distribution(model_id: str, *, active_range: bool = True, return_df: bool = False)¶
Generate a score distribution plot for a specific model.
- Parameters:
- Returns:
Plotly figure showing score distribution or DataFrame if return_df=True
- Return type:
Union[Figure, pl.LazyFrame]
- Raises:
ValueError – If no data is available for the provided model ID
- multiple_score_distributions(query: pdstools.utils.types.QUERY | None = None, show_all: bool = True) list[Figure]¶
Generate the score distribution plot for all models in the query
- predictor_binning(model_id: str, predictor_name: str, return_df: bool = False)¶
Generate a predictor binning plot for a specific model and predictor.
- Parameters:
- Returns:
Plotly figure showing predictor binning or DataFrame if return_df=True
- Return type:
Union[Figure, pl.LazyFrame]
- Raises:
ValueError – If no data is available for the provided model ID and predictor name
- multiple_predictor_binning(model_id: str, query: pdstools.utils.types.QUERY | None = None, show_all=True) list[Figure]¶
Generate predictor binning plots for all predictors in a model.
- Parameters:
- Returns:
A list of Plotly figures, one for each predictor in the model
- Return type:
list[Figure]
- _boxplot_pre_aggregated(df: polars.LazyFrame, *, y_col: str, metric_col: str, metric_weight_col: str | None = None, legend_col: str | None = None, return_df: bool = False)¶
- predictor_performance(*, metric: str = 'Performance', top_n: int | None = None, active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Plots a box plot of the performance of the predictors
Use the query argument to drill down to a more specific subset If top n is given, chooses the top predictors based on the weighted average performance across models, ordered by their median performance.
- Parameters:
metric (str, optional) – The metric to plot, by default “Performance” This is more for future-proofing, once FeatureImportance gets more used.
top_n (Optional[int], optional) – The top n predictors to plot, by default None
active_only (bool, optional) – Whether to only consider active predictor performance, by default False
query (Optional[QUERY], optional) – The query to apply to the data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
See also
pdstools.adm.ADMDatamart.apply_predictor_categorizationhow to override the out of the box predictor categorization
- predictor_category_performance(*, metric: str = 'Performance', active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Plot the predictor category performance
- Parameters:
metric (str, optional) – The metric to plot, by default “Performance”
active_only (bool, optional) – Whether to only analyze active predictors, by default False
query (Optional[QUERY], optional) – An optional query to apply, by default None
return_df (bool, optional) – An optional flag to get the dataframe instead, by default False
- Returns:
A Plotly figure
- Return type:
px.Figure
See also
pdstools.adm.ADMDatamart.apply_predictor_categorizationhow to override the out of the box predictor categorization
- predictor_contribution(*, by: str = 'Configuration', query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Plots the predictor contribution for each configuration
- Parameters:
- Returns:
A plotly figure
- Return type:
px.Figure
See also
pdstools.adm.ADMDatamart.apply_predictor_categorizationhow to override the out of the box predictor categorization
- predictor_performance_heatmap(*, top_predictors: int = 20, top_groups: int | None = None, by: str = 'Name', active_only: bool = False, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Generate a heatmap showing predictor performance across different groups.
- Parameters:
top_predictors (int, optional) – Number of top-performing predictors to include, by default 20
top_groups (int, optional) – Number of top groups to include, by default None (all groups)
by (str, optional) – Column to group by for the heatmap, by default “Name”
active_only (bool, optional) – Whether to only include active predictors, by default False
query (Optional[QUERY], optional) – Optional query to filter the data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
- Returns:
Plotly heatmap figure or DataFrame if return_df=True
- Return type:
Union[Figure, pl.LazyFrame]
- gains_chart(value: str, *, index: str | None = None, by: str | list[str] | None = None, query: pdstools.utils.types.QUERY | None = None, title: str | None = None, return_df: bool = False) Figure | polars.LazyFrame¶
Generate a gains chart showing cumulative distribution of a metric.
Creates a gains/lift chart to visualize model response skewness. Shows what percentage of the total value (e.g., responses, positives) is driven by what percentage of models. Useful for identifying if a small number of models drive most of the volume.
- Parameters:
value (str) – Column name containing the metric to compute gains for (e.g., “ResponseCount”, “Positives”)
index (str, optional) – Column name to normalize by (e.g., population size). If None, uses model count.
by (str | list[str], optional) – Column(s) to group by for separate gain curves (e.g., “Channel” or [“Channel”, “Direction”])
query (QUERY, optional) – Optional query to filter the data before computing gains
title (str, optional) – Chart title. If None, uses “Gains Chart”
return_df (bool, default False) – If True, return the gains data instead of the figure
- Returns:
Plotly figure showing the gains chart, or LazyFrame if return_df=True
- Return type:
Figure | pl.LazyFrame
Examples
>>> # Single gains curve for response count >>> fig = datamart.plot.gains_chart(value="ResponseCount")
>>> # Gains curves by channel for positives >>> fig = datamart.plot.gains_chart( ... value="Positives", ... by=["Channel", "Direction"], ... title="Cumulative Positives by Channel" ... )
- performance_volume_distribution(*, by: str | list[str] | None = None, query: pdstools.utils.types.QUERY | None = None, bin_width: int = 3, title: str | None = None, return_df: bool = False) Figure | polars.LazyFrame¶
Generate a performance vs volume distribution chart.
Shows how response volume is distributed across different model performance ranges. Helps identify if volume is driven by high-performing or low-performing models. Ideally, most volume should be in the 60-80 AUC range.
- Parameters:
by (str | list[str], optional) – Column(s) to group by for separate curves (e.g., “Channel” or [“Channel”, “Direction”]) If None, creates a single curve for all data
query (QUERY, optional) – Optional query to filter the data before analysis
bin_width (int, default 3) – Width of performance bins in AUC points (default creates bins of 3: 50-53, 53-56, etc.)
title (str, optional) – Chart title. If None, uses “Performance vs Volume”
return_df (bool, default False) – If True, return the binned data instead of the figure
- Returns:
Plotly figure showing performance distribution, or LazyFrame if return_df=True
- Return type:
Figure | pl.LazyFrame
Notes
Performance is binned from 50-100 using the specified bin_width. The chart shows what percentage of responses fall into each performance bin, grouped by the by parameter if provided.
Examples
>>> # Single curve for all channels >>> fig = datamart.plot.performance_volume_distribution()
>>> # Separate curves per channel >>> fig = datamart.plot.performance_volume_distribution( ... by=["Channel", "Direction"], ... title="Performance Distribution by Channel" ... )
- tree_map(metric: Literal['ResponseCount', 'Positives', 'Performance', 'SuccessRate', 'percentage_without_responses'] = 'Performance', *, by: str = 'Name', query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Generate a tree map visualization showing hierarchical model metrics.
- Parameters:
metric (Literal["ResponseCount", "Positives", "Performance", "SuccessRate", "percentage_without_responses"], optional) – The metric to visualize in the tree map, by default “Performance”
by (str, optional) – Column to group by for the tree map hierarchy, by default “Name”
query (Optional[QUERY], optional) – Optional query to filter the data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
- Returns:
Plotly treemap figure or DataFrame if return_df=True
- Return type:
Union[Figure, pl.LazyFrame]
- predictor_count(*, by: str | list[str] = ['EntryType', 'Type'], query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Generate a box plot showing the distribution of predictor counts by type.
- Parameters:
- Returns:
Plotly box plot figure or DataFrame if return_df=True
- Return type:
Union[Figure, pl.LazyFrame]
- binning_lift(model_id: str, predictor_name: str, *, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Generate a binning lift plot for a specific predictor showing propensity lift per bin.
- Parameters:
model_id (str) – The ID of the model containing the predictor
predictor_name (str) – Name of the predictor to analyze for lift
query (Optional[QUERY], optional) – Optional query to filter the predictor data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
- Returns:
Plotly bar chart showing binning lift or DataFrame if return_df=True
- Return type:
Union[Figure, pl.LazyFrame]
- action_overlap(group_col: str | list[str] | polars.Expr = 'Channel', overlap_col='Name', *, show_fraction=True, query: pdstools.utils.types.QUERY | None = None, return_df: bool = False)¶
Generate an overlap matrix heatmap showing shared actions across different groups.
- Parameters:
group_col (Union[str, list[str], pl.Expr], optional) – Column(s) to group by for overlap analysis, by default “Channel”
overlap_col (str, optional) – Column containing values to analyze for overlap, by default “Name”
show_fraction (bool, optional) – Whether to show overlap as fraction or absolute count, by default True
query (Optional[QUERY], optional) – Optional query to filter the data, by default None
return_df (bool, optional) – Whether to return a dataframe instead of a plot, by default False
- Returns:
Plotly heatmap showing action overlap or DataFrame if return_df=True
- Return type:
Union[Figure, pl.LazyFrame]
- partitioned_plot(func: collections.abc.Callable, facets: list[dict[str, str | None]], show_plots: bool = True, *args, **kwargs)¶
Execute a plotting function across multiple faceted subsets of data.
This method applies a given plotting function to multiple filtered subsets of data, where each subset is defined by the facet conditions. It’s useful for generating multiple plots with different filter conditions applied.
- Parameters:
func (Callable) – The plotting function to execute for each facet
facets (list[dict[str, Optional[str]]]) – list of dictionaries defining filter conditions for each facet
show_plots (bool, optional) – Whether to display the plots as they are generated, by default True
*args (tuple) – Additional positional arguments to pass to the plotting function
**kwargs (dict) – Additional keyword arguments to pass to the plotting function
- Returns:
list of Plotly figures, one for each facet condition
- Return type:
list[Figure]