pdstools.utils.report_utils¶
Attributes¶
Functions¶
|
Generate the output filename based on the report parameters. |
|
Copy the report quarto file to the temporary directory. |
|
Write parameters to YAML files for Quarto processing. |
|
Run the Quarto command to generate the report. |
|
Set the options for the Quarto command. |
|
Copy report resources from the reports directory to specified destinations. |
|
Generate a zipped archive of a directory. |
|
Get command output in an OS-agnostic way. |
|
Extract version number from version string. |
|
Get Quarto executable path and version. |
|
Get Pandoc executable path and version. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Create an interactive table with RAG coloring for metric columns. |
|
Create a great_tables table with RAG coloring for metric columns. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Deserialize a query that was previously serialized with serialize_query. |
Remove duplicate script tags from HTML to reduce file size. |
Module Contents¶
- logger¶
- get_output_filename(name: str | None, report_type: str, model_id: str | None = None, output_type: str = 'html') str¶
Generate the output filename based on the report parameters.
- copy_quarto_file(qmd_file: str, temp_dir: pathlib.Path) None¶
Copy the report quarto file to the temporary directory.
- Parameters:
qmd_file (str) – Name of the Quarto markdown file to copy
temp_dir (Path) – Destination directory to copy files to
- Return type:
None
- _write_params_files(temp_dir: pathlib.Path, params: Dict | None = None, project: Dict = {'type': 'default'}, analysis: Dict | None = None, size_reduction_method: Literal['strip', 'cdn'] | None = None) None¶
Write parameters to YAML files for Quarto processing.
- Parameters:
temp_dir (Path) – Directory where YAML files will be written
params (dict, optional) – Parameters to write to params.yml, by default None
project (dict, optional) – Project configuration to write to _quarto.yml, by default {“type”: “default”}
analysis (dict, optional) – Analysis configuration to write to _quarto.yml, by default None
size_reduction_method (Optional[Literal["strip", "cdn"]], default=None) – When “cdn”, sets plotly-connected to false so Plotly.js loads from CDN (resulting in smaller files ~8MB vs ~110MB). When None or “strip”, sets plotly-connected to true for fully embedded Plotly.
- Return type:
None
- run_quarto(qmd_file: str | None = None, output_filename: str | None = None, output_type: str | None = 'html', params: Dict | None = None, project: Dict = {'type': 'default'}, analysis: Dict | None = None, temp_dir: pathlib.Path = Path('.'), verbose: bool = False, *, size_reduction_method: Literal['strip', 'cdn'] | None = None) int¶
Run the Quarto command to generate the report.
- Parameters:
qmd_file (str, optional) – Path to the Quarto markdown file to render, by default None
output_filename (str, optional) – Name of the output file, by default None
output_type (str, optional) – Type of output format (html, pdf, etc.), by default “html”
params (dict, optional) – Parameters to pass to Quarto execution, by default None
project (dict, optional) – Project configuration settings, by default {“type”: “default”}
analysis (dict, optional) – Analysis configuration settings, by default None
temp_dir (Path, optional) – Temporary directory for processing, by default Path(“.”)
verbose (bool, optional) – Whether to print detailed execution logs, by default False
size_reduction_method (Optional[Literal["strip", "cdn"]], default=None) – When None will fully embed all resources into the HTML output. When “cdn” will pass this on to Quarto and Plotly so Javascript libraries will be loaded from the internet. When “strip” the HTML will be post-processed to remove duplicate Javascript that would otherwise get embedded multiple times.
- Returns:
Return code from the Quarto process (0 for success)
- Return type:
- Raises:
RuntimeError – If the Quarto process fails (non-zero return code), includes captured output
subprocess.SubprocessError – If the Quarto command fails to execute
FileNotFoundError – If required files are not found
- _set_command_options(output_type: str | None = None, output_filename: str | None = None, execute_params: bool = False) List[str]¶
Set the options for the Quarto command.
- Parameters:
- Returns:
List of command line options for Quarto
- Return type:
List[str]
- copy_report_resources(resource_dict: list[tuple[str, str]])¶
Copy report resources from the reports directory to specified destinations.
- generate_zipped_report(output_filename: str, folder_to_zip: str)¶
Generate a zipped archive of a directory.
This is a general-purpose utility function that can compress any directory into a zip archive. While named for report generation, it works with any directory structure.
- Parameters:
- Return type:
None
- Raises:
FileNotFoundError – If the folder to zip does not exist or is not a directory
Examples
>>> generate_zipped_report("my_archive.zip", "/path/to/directory") >>> generate_zipped_report("report_2023", "/tmp/report_output")
- get_quarto_with_version(verbose: bool = True) Tuple[pathlib.Path, str]¶
Get Quarto executable path and version.
- Parameters:
verbose (bool)
- Return type:
Tuple[pathlib.Path, str]
- get_pandoc_with_version(verbose: bool = True) Tuple[pathlib.Path, str]¶
Get Pandoc executable path and version.
- Parameters:
verbose (bool)
- Return type:
Tuple[pathlib.Path, str]
- quarto_print(text)¶
- quarto_callout_info(info)¶
- quarto_callout_important(info)¶
- quarto_callout_no_prediction_data_warning(extra='')¶
- quarto_callout_no_predictor_data_warning(extra='')¶
- polars_col_exists(df, col)¶
- polars_subset_to_existing_cols(all_columns, cols)¶
- create_metric_itable(source_table: polars.DataFrame, column_to_metric: Dict | None = None, column_descriptions: Dict[str, str] | None = None, color_background: bool = False, strict_metric_validation: bool = True, highlight_issues_only: bool = False, **itable_kwargs)¶
Create an interactive table with RAG coloring for metric columns.
Displays the table using itables with cells colored based on RAG (Red/Amber/Green) status derived from metric thresholds.
- Parameters:
source_table (pl.DataFrame) – DataFrame containing data columns to be colored.
column_to_metric (dict, optional) –
Mapping from column names (or tuples of column names) to one of:
str: metric ID to look up in MetricLimits.csv
callable: function(value) -> “RED”|”AMBER”|”YELLOW”|”GREEN”|None
tuple: (metric_id, value_mapping) where value_mapping is a dict that maps column values to metric values before evaluation. Supports tuple keys for multiple values: {(“Yes”, “yes”): True}
If a column is not in this dict, its name is used as the metric ID.
column_descriptions (dict, optional) – Mapping from column names to tooltip descriptions. When provided, column headers will display the description as a tooltip on hover. Example: {“Performance”: “Model AUC performance metric”}
color_background (bool, default False) – If True, colors the cell background. If False, colors the text (foreground).
strict_metric_validation (bool, default True) – If True, raises an exception if a metric ID in column_to_metric is not found in MetricLimits.csv. Set to False to skip validation.
highlight_issues_only (bool, default False) – If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted). Set to False to also highlight GREEN values.
**itable_kwargs – Additional keyword arguments passed to itables.show(). Common options include: lengthMenu, paging, searching, ordering.
- Returns:
An itables display object that will render in Jupyter/Quarto.
- Return type:
itables HTML display
Examples
>>> from pdstools.utils.report_utils import create_metric_itable >>> create_metric_itable( ... df, ... column_to_metric={ ... # Simple metric ID ... "Performance": "ModelPerformance", ... # Custom RAG function ... "Channel": standard_NBAD_channels_rag, ... # Value mapping: column values -> metric values ... "AGB": ("UsingAGB", {"Yes": True, "No": False}), ... # Multiple column values to same metric value ... "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}), ... }, ... column_descriptions={ ... "Performance": "Model AUC performance metric", ... "Channel": "Communication channel for the action", ... }, ... paging=False ... )
- create_metric_gttable(source_table: polars.DataFrame, title: str | None = None, subtitle: str | None = None, column_to_metric: Dict | None = None, column_descriptions: Dict[str, str] | None = None, color_background: bool = True, strict_metric_validation: bool = True, highlight_issues_only: bool = True, **gt_kwargs)¶
Create a great_tables table with RAG coloring for metric columns.
Displays the table using great_tables with cells colored based on RAG (Red/Amber/Green) status derived from metric thresholds.
- Parameters:
source_table (pl.DataFrame) – DataFrame containing data columns to be colored.
title (str, optional) – Table title.
subtitle (str, optional) – Table subtitle.
column_to_metric (dict, optional) –
Mapping from column names (or tuples of column names) to one of:
str: metric ID to look up in MetricLimits.csv
callable: function(value) -> “RED”|”AMBER”|”YELLOW”|”GREEN”|None
tuple: (metric_id, value_mapping) where value_mapping is a dict that maps column values to metric values before evaluation. Supports tuple keys for multiple values: {(“Yes”, “yes”): True}
If a column is not in this dict, its name is used as the metric ID.
column_descriptions (dict, optional) – Mapping from column names to tooltip descriptions. When provided, column headers will display the description as a tooltip on hover. Example: {“Performance”: “Model AUC performance metric”}
color_background (bool, default True) – If True, colors the cell background. If False, colors the text.
strict_metric_validation (bool, default True) – If True, raises an exception if a metric ID in column_to_metric is not found in MetricLimits.csv. Set to False to skip validation.
highlight_issues_only (bool, default True) – If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted). Set to False to also highlight GREEN values.
**gt_kwargs – Additional keyword arguments passed to great_tables.GT constructor. Common options include: rowname_col, groupname_col.
- Returns:
A great_tables instance with RAG coloring applied.
- Return type:
great_tables.GT
Examples
>>> from pdstools.utils.report_utils import create_metric_gttable >>> create_metric_gttable( ... df, ... title="Model Overview", ... column_to_metric={ ... # Simple metric ID ... "Performance": "ModelPerformance", ... # Custom RAG function ... "Channel": standard_NBAD_channels_rag, ... # Value mapping: column values -> metric values ... "AGB": ("UsingAGB", {"Yes": True, "No": False}), ... # Multiple column values to same metric value ... "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}), ... }, ... column_descriptions={ ... "Performance": "Model AUC performance metric", ... "Channel": "Communication channel for the action", ... }, ... rowname_col="Name", ... )
- n_unique_values(dm, all_dm_cols, fld)¶
- max_by_hierarchy(dm, all_dm_cols, fld, grouping)¶
- avg_by_hierarchy(dm, all_dm_cols, fld, grouping)¶
- sample_values(dm, all_dm_cols, fld, n=6)¶
- serialize_query(query: pdstools.utils.types.QUERY | None) Dict | None¶
- Parameters:
query (Optional[pdstools.utils.types.QUERY])
- Return type:
Optional[Dict]
- deserialize_query(serialized_query: Dict | None) pdstools.utils.types.QUERY | None¶
Deserialize a query that was previously serialized with serialize_query.
- Parameters:
serialized_query (Optional[Dict]) – A serialized query dictionary created by serialize_query
- Returns:
The deserialized query
- Return type:
Optional[QUERY]
- remove_duplicate_html_scripts(html_content: str, verbose: bool = False) str¶
Remove duplicate script tags from HTML to reduce file size.
Specifically targets large JavaScript libraries (like Plotly.js) that get embedded multiple times in HTML reports, while preserving all unique plot data and initialization scripts.