pdstools.utils.report_utils¶

Attributes¶

logger

Functions¶

`get_output_filename`(→ str)	Generate the output filename based on the report parameters.
`copy_quarto_file`(→ None)	Copy the report quarto file to the temporary directory.
`run_quarto`(, verbose, *, full_embed)	Run the Quarto command to generate the report.
`copy_report_resources`(resource_dict)	Copy report resources from the reports directory to specified destinations.
`generate_zipped_report`(output_filename, folder_to_zip)	Generate a zipped archive of a directory.
`get_quarto_with_version`(→ tuple[pathlib.Path, str])	Get Quarto executable path and version.
`get_pandoc_with_version`(→ tuple[pathlib.Path, str])	Get Pandoc executable path and version.
`quarto_print`(text)
`quarto_callout_info`(info)
`quarto_callout_important`(info)
`quarto_plot_exception`(plot_name, e)
`quarto_callout_no_prediction_data_warning`([extra])
`quarto_callout_no_predictor_data_warning`([extra])
`polars_col_exists`(df, col)
`polars_subset_to_existing_cols`(all_columns, cols)
`create_metric_itable`(source_table[, column_to_metric, ...])	Create an interactive table with RAG coloring for metric columns.
`create_metric_gttable`(source_table[, title, subtitle, ...])	Create a great_tables table with RAG coloring for metric columns.
`n_unique_values`(dm, all_dm_cols, fld)
`max_by_hierarchy`(dm, all_dm_cols, fld, grouping)
`avg_by_hierarchy`(dm, all_dm_cols, fld, grouping)
`sample_values`(dm, all_dm_cols, fld[, n])
`show_credits`([quarto_source])	Display a credits section with build metadata at the end of a report.
`serialize_query`(→ dict \| None)
`deserialize_query`(→ pdstools.utils.types.QUERY \| None)	Deserialize a query that was previously serialized with serialize_query.
`gains_table`(→ polars.DataFrame)	Calculate cumulative gains for visualization.
`check_report_for_errors`(→ list[str])	Check generated report HTML for error indicators.

Module Contents¶

logger¶

get_output_filename(name: str | None, report_type: str, model_id: str | None = None, output_type: str = 'html') → str¶

Generate the output filename based on the report parameters.

Parameters:

name (str | None)
report_type (str)
model_id (str | None)
output_type (str)

Return type:

str

copy_quarto_file(qmd_file: str, temp_dir: pathlib.Path) → None¶

Copy the report quarto file to the temporary directory.

Parameters:

qmd_file (str) – Name of the Quarto markdown file to copy
temp_dir (Path) – Destination directory to copy files to

Return type:

None

run_quarto(qmd_file: str | None = None, output_filename: str | None = None, output_type: str | None = 'html', params: dict | None = None, project: dict = {'type': 'default'}, analysis: dict | None = None, temp_dir: pathlib.Path = Path(), verbose: bool = False, *, full_embed: bool = False) → int¶

Run the Quarto command to generate the report.

Parameters:

qmd_file (str, optional) – Path to the Quarto markdown file to render, by default None
output_filename (str, optional) – Name of the output file, by default None
output_type (str, optional) – Type of output format (html, pdf, etc.), by default “html”
params (dict, optional) – Parameters to pass to Quarto execution, by default None
project (dict, optional) – Project configuration settings, by default {“type”: “default”}
analysis (dict, optional) – Analysis configuration settings, by default None
temp_dir (Path, optional) – Temporary directory for processing, by default Path(“.”)
verbose (bool, optional) – Whether to print detailed execution logs, by default False
full_embed (bool, default=False) – When True, fully embeds all JavaScript libraries (Plotly, itables, etc.) into the HTML output (larger file). When False, loads JavaScript libraries from CDN and skips esbuild bundling, avoiding the need for esbuild (see issue #620).

Returns:

Return code from the Quarto process (0 for success)

Return type:

int

Raises:

RuntimeError – If the Quarto process fails (non-zero return code), includes captured output
subprocess.SubprocessError – If the Quarto command fails to execute
FileNotFoundError – If required files are not found

copy_report_resources(resource_dict: list[tuple[str, str]])¶

Copy report resources from the reports directory to specified destinations.

Parameters:: resource_dict (list[tuple[str, str]]) – list of tuples containing (source_path, destination_path) pairs
Return type:: None

generate_zipped_report(output_filename: str, folder_to_zip: str)¶

Generate a zipped archive of a directory.

This is a general-purpose utility function that can compress any directory into a zip archive. While named for report generation, it works with any directory structure.

Parameters:

output_filename (str) – Name of the output file (extension will be replaced with .zip)
folder_to_zip (str) – Path to the directory to be compressed

Return type:

None

Raises:

FileNotFoundError – If the folder to zip does not exist or is not a directory

Examples

>>> generate_zipped_report("my_archive.zip", "/path/to/directory")
>>> generate_zipped_report("report_2023", "/tmp/report_output")

get_quarto_with_version(verbose: bool = True) → tuple[pathlib.Path, str]¶

Get Quarto executable path and version.

Parameters:: verbose (bool)
Return type:: tuple[pathlib.Path, str]

get_pandoc_with_version(verbose: bool = True) → tuple[pathlib.Path, str]¶

Get Pandoc executable path and version.

Parameters:: verbose (bool)
Return type:: tuple[pathlib.Path, str]

quarto_print(text)¶

quarto_callout_info(info)¶

quarto_callout_important(info)¶

quarto_plot_exception(plot_name: str, e: Exception)¶

Parameters:

plot_name (str)
e (Exception)

quarto_callout_no_prediction_data_warning(extra='')¶

quarto_callout_no_predictor_data_warning(extra='')¶

polars_col_exists(df, col)¶

polars_subset_to_existing_cols(all_columns, cols)¶

create_metric_itable(source_table: polars.DataFrame, column_to_metric: dict | None = None, column_descriptions: dict[str, str] | None = None, color_background: bool = False, strict_metric_validation: bool = True, highlight_issues_only: bool = False, rag_source: polars.DataFrame | None = None, **itable_kwargs)¶

Create an interactive table with RAG coloring for metric columns.

Displays the table using itables with cells colored based on RAG (Red/Amber/Green) status derived from metric thresholds.

Parameters:

source_table (pl.DataFrame) – DataFrame containing data columns to be colored.
column_to_metric (dict, optional) –
Mapping from column names (or tuples of column names) to one of:
- str: metric ID to look up in MetricLimits.csv
- callable: function(value) -> “RED”|”AMBER”|”YELLOW”|”GREEN”|None
- tuple: (metric_id, value_mapping) where value_mapping is a dict that maps column values to metric values before evaluation. Supports tuple keys for multiple values: {(“Yes”, “yes”): True}
If a column is not in this dict, its name is used as the metric ID.
column_descriptions (dict, optional) – Mapping from column names to tooltip descriptions. When provided, column headers will display the description as a tooltip on hover. Example: {“Performance”: “Model AUC performance metric”}
color_background (bool, default False) – If True, colors the cell background. If False, colors the text (foreground).
strict_metric_validation (bool, default True) – If True, raises an exception if a metric ID in column_to_metric is not found in MetricLimits.csv. Set to False to skip validation.
highlight_issues_only (bool, default False) – If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted). Set to False to also highlight GREEN values.
rag_source (pl.DataFrame, optional) – If provided, RAG thresholds are evaluated against this DataFrame instead of source_table. Use this when source_table contains non-numeric display values (e.g. HTML strings) but you still want RAG coloring based on the original numeric data. Must have the same columns and row order as source_table.
**itable_kwargs – Additional keyword arguments passed to itables.show(). Common options include: lengthMenu, paging, searching, ordering.

Returns:

An itables display object that will render in Jupyter/Quarto.

Return type:

itables HTML display

Examples

>>> from pdstools.utils.report_utils import create_metric_itable
>>> create_metric_itable(
...     df,
...     column_to_metric={
...         # Simple metric ID
...         "Performance": "ModelPerformance",
...         # Custom RAG function
...         "Channel": standard_NBAD_channels_rag,
...         # Value mapping: column values -> metric values
...         "AGB": ("UsingAGB", {"Yes": True, "No": False}),
...         # Multiple column values to same metric value
...         "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}),
...     },
...     column_descriptions={
...         "Performance": "Model AUC performance metric",
...         "Channel": "Communication channel for the action",
...     },
...     paging=False
... )

create_metric_gttable(source_table: polars.DataFrame, title: str | None = None, subtitle: str | None = None, column_to_metric: dict | None = None, column_descriptions: dict[str, str] | None = None, color_background: bool = True, strict_metric_validation: bool = True, highlight_issues_only: bool = True, **gt_kwargs)¶

Create a great_tables table with RAG coloring for metric columns.

Displays the table using great_tables with cells colored based on RAG (Red/Amber/Green) status derived from metric thresholds.

Parameters:

source_table (pl.DataFrame) – DataFrame containing data columns to be colored.
title (str, optional) – Table title.
subtitle (str, optional) – Table subtitle.
column_to_metric (dict, optional) –
Mapping from column names (or tuples of column names) to one of:
- str: metric ID to look up in MetricLimits.csv
- callable: function(value) -> “RED”|”AMBER”|”YELLOW”|”GREEN”|None
- tuple: (metric_id, value_mapping) where value_mapping is a dict that maps column values to metric values before evaluation. Supports tuple keys for multiple values: {(“Yes”, “yes”): True}
If a column is not in this dict, its name is used as the metric ID.
column_descriptions (dict, optional) – Mapping from column names to tooltip descriptions. When provided, column headers will display the description as a tooltip on hover. Example: {“Performance”: “Model AUC performance metric”}
color_background (bool, default True) – If True, colors the cell background. If False, colors the text.
strict_metric_validation (bool, default True) – If True, raises an exception if a metric ID in column_to_metric is not found in MetricLimits.csv. Set to False to skip validation.
highlight_issues_only (bool, default True) – If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted). Set to False to also highlight GREEN values.
**gt_kwargs – Additional keyword arguments passed to great_tables.GT constructor. Common options include: rowname_col, groupname_col.

Returns:

A great_tables instance with RAG coloring applied.

Return type:

great_tables.GT

Examples

>>> from pdstools.utils.report_utils import create_metric_gttable
>>> create_metric_gttable(
...     df,
...     title="Model Overview",
...     column_to_metric={
...         # Simple metric ID
...         "Performance": "ModelPerformance",
...         # Custom RAG function
...         "Channel": standard_NBAD_channels_rag,
...         # Value mapping: column values -> metric values
...         "AGB": ("UsingAGB", {"Yes": True, "No": False}),
...         # Multiple column values to same metric value
...         "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}),
...     },
...     column_descriptions={
...         "Performance": "Model AUC performance metric",
...         "Channel": "Communication channel for the action",
...     },
...     rowname_col="Name",
... )

n_unique_values(dm, all_dm_cols, fld)¶

max_by_hierarchy(dm, all_dm_cols, fld, grouping)¶

avg_by_hierarchy(dm, all_dm_cols, fld, grouping)¶

sample_values(dm, all_dm_cols, fld, n=6)¶

show_credits(quarto_source: str | None = None)¶

Display a credits section with build metadata at the end of a report.

Prints a formatted block containing the generation timestamp, Quarto and Pandoc versions, and optionally the source notebook path.

Parameters:: quarto_source (str, optional) – Path or identifier of the source .qmd file. Include this for standalone reports where knowing the source is useful. Omit for Quarto website projects where pages are generated from templates.

serialize_query(query: pdstools.utils.types.QUERY | None) → dict | None¶

Parameters:: query (pdstools.utils.types.QUERY | None)
Return type:: dict | None

deserialize_query(serialized_query: dict | None) → pdstools.utils.types.QUERY | None¶

Deserialize a query that was previously serialized with serialize_query.

Parameters:: serialized_query (Optional[dict]) – A serialized query dictionary created by serialize_query
Returns:: The deserialized query
Return type:: Optional[QUERY]

gains_table(df: polars.LazyFrame | polars.DataFrame, value: str, index: str | None = None, by: str | list[str] | None = None) → polars.DataFrame¶

Calculate cumulative gains for visualization.

Computes cumulative distribution of a value metric, sorted by the ratio of value to index (or by value alone if no index). Used for gains charts to show model response skewness.

Parameters:

df (pl.LazyFrame | pl.DataFrame) – Input data containing the value and optional index columns
value (str) – Column name containing the metric to compute gains for (e.g., “ResponseCount”)
index (str, optional) – Column name to normalize by (e.g., population size). If None, uses row count.
by (str | list[str], optional) – Column(s) to group by for separate gain curves. If None, computes single curve.

Returns:

DataFrame with columns: - cum_x: Cumulative proportion of index (or models) - cum_y: Cumulative proportion of value - by columns: if by is specified

Return type:

pl.DataFrame

Examples

>>> # Single gains curve for response count
>>> gains = gains_table(df, value="ResponseCount")

>>> # Gains curves by channel, normalized by population
>>> gains = gains_table(df, value="Positives", index="Population", by="Channel")

check_report_for_errors(html_path: str | pathlib.Path) → list[str]¶

Check generated report HTML for error indicators.

Scans the HTML file for error patterns that indicate plot rendering failures or exceptions during report generation. These errors are typically hidden in collapsed callout sections but should be caught in testing.

Parameters:: html_path (str or Path) – Path to the HTML file to check
Returns:: List of error descriptions found (empty if no errors)
Return type:: list[str]
Raises:: FileNotFoundError – If the HTML file does not exist

Examples

>>> from pdstools.utils.report_utils import check_report_for_errors
>>> errors = check_report_for_errors("HealthCheck.html")
>>> if errors:
...     print(f"Found {len(errors)} error(s):")
...     for error in errors:
...         print(f"  - {error}")