pdstools.utils.report_utils

Attributes

Functions

get_output_filename(→ str)

Generate the output filename based on the report parameters.

copy_quarto_file(→ None)

Copy the report quarto file to the temporary directory.

_write_params_files(→ None)

Write parameters to YAML files for Quarto processing.

run_quarto(, verbose, *, remove_duplicate_html_scripts)

Run the Quarto command to generate the report.

_set_command_options(→ List[str])

Set the options for the Quarto command.

copy_report_resources(resource_dict)

Copy report resources from the reports directory to specified destinations.

generate_zipped_report(output_filename, folder_to_zip)

Generate a zipped archive of a directory.

_get_cmd_output(→ List[str])

Get command output in an OS-agnostic way.

_get_version_only(→ str)

Extract version number from version string.

get_quarto_with_version(→ Tuple[pathlib.Path, str])

Get Quarto executable path and version.

get_pandoc_with_version(→ Tuple[pathlib.Path, str])

Get Pandoc executable path and version.

quarto_print(text)

quarto_callout_info(info)

quarto_callout_important(info)

quarto_plot_exception(plot_name, e)

quarto_callout_no_prediction_data_warning([extra])

quarto_callout_no_predictor_data_warning([extra])

polars_col_exists(df, col)

polars_subset_to_existing_cols(all_columns, cols)

rag_background_styler([rag])

rag_background_styler_dense([rag])

rag_textcolor_styler([rag])

table_standard_formatting(source_table[, title, ...])

table_style_predictor_count(gt, flds[, ...])

n_unique_values(dm, all_dm_cols, fld)

max_by_hierarchy(dm, all_dm_cols, fld, grouping)

avg_by_hierarchy(dm, all_dm_cols, fld, grouping)

sample_values(dm, all_dm_cols, fld[, n])

show_credits(quarto_source)

serialize_query(→ Optional[Dict])

deserialize_query(→ Optional[pdstools.utils.types.QUERY])

Deserialize a query that was previously serialized with serialize_query.

remove_duplicate_html_scripts(→ str)

Remove duplicate script tags from HTML to reduce file size.

Module Contents

logger
get_output_filename(name: str | None, report_type: str, model_id: str | None = None, output_type: str = 'html') str

Generate the output filename based on the report parameters.

Parameters:
  • name (Optional[str])

  • report_type (str)

  • model_id (Optional[str])

  • output_type (str)

Return type:

str

copy_quarto_file(qmd_file: str, temp_dir: pathlib.Path) None

Copy the report quarto file to the temporary directory.

Parameters:
  • qmd_file (str) – Name of the Quarto markdown file to copy

  • temp_dir (Path) – Destination directory to copy files to

Return type:

None

_write_params_files(temp_dir: pathlib.Path, params: Dict | None = None, project: Dict = {'type': 'default'}, analysis: Dict | None = None) None

Write parameters to YAML files for Quarto processing.

Parameters:
  • temp_dir (Path) – Directory where YAML files will be written

  • params (dict, optional) – Parameters to write to params.yml, by default None

  • project (dict, optional) – Project configuration to write to _quarto.yml, by default {“type”: “default”}

  • analysis (dict, optional) – Analysis configuration to write to _quarto.yml, by default None

Return type:

None

run_quarto(qmd_file: str | None = None, output_filename: str | None = None, output_type: str | None = 'html', params: Dict | None = None, project: Dict = {'type': 'default'}, analysis: Dict | None = None, temp_dir: pathlib.Path = Path('.'), verbose: bool = False, *, remove_duplicate_html_scripts: bool) int

Run the Quarto command to generate the report.

Parameters:
  • qmd_file (str, optional) – Path to the Quarto markdown file to render, by default None

  • output_filename (str, optional) – Name of the output file, by default None

  • output_type (str, optional) – Type of output format (html, pdf, etc.), by default “html”

  • params (dict, optional) – Parameters to pass to Quarto execution, by default None

  • project (dict, optional) – Project configuration settings, by default {“type”: “default”}

  • analysis (dict, optional) – Analysis configuration settings, by default None

  • temp_dir (Path, optional) – Temporary directory for processing, by default Path(“.”)

  • verbose (bool, optional) – Whether to print detailed execution logs, by default False

  • remove_duplicate_html_scripts (bool) – Whether to remove duplicate HTML script tags from the output

Returns:

Return code from the Quarto process (0 for success)

Return type:

int

Raises:
_set_command_options(output_type: str | None = None, output_filename: str | None = None, execute_params: bool = False) List[str]

Set the options for the Quarto command.

Parameters:
  • output_type (str, optional) – Output format type (html, pdf, etc.), by default None

  • output_filename (str, optional) – Name of the output file, by default None

  • execute_params (bool, optional) – Whether to include parameter execution flag, by default False

Returns:

List of command line options for Quarto

Return type:

List[str]

copy_report_resources(resource_dict: list[tuple[str, str]])

Copy report resources from the reports directory to specified destinations.

Parameters:

resource_dict (list[tuple[str, str]]) – List of tuples containing (source_path, destination_path) pairs

Return type:

None

generate_zipped_report(output_filename: str, folder_to_zip: str)

Generate a zipped archive of a directory.

This is a general-purpose utility function that can compress any directory into a zip archive. While named for report generation, it works with any directory structure.

Parameters:
  • output_filename (str) – Name of the output file (extension will be replaced with .zip)

  • folder_to_zip (str) – Path to the directory to be compressed

Return type:

None

Raises:

FileNotFoundError – If the folder to zip does not exist or is not a directory

Examples

>>> generate_zipped_report("my_archive.zip", "/path/to/directory")
>>> generate_zipped_report("report_2023", "/tmp/report_output")
_get_cmd_output(args: List[str]) List[str]

Get command output in an OS-agnostic way.

Parameters:

args (List[str])

Return type:

List[str]

_get_version_only(versionstr: str) str

Extract version number from version string.

Parameters:

versionstr (str)

Return type:

str

get_quarto_with_version(verbose: bool = True) Tuple[pathlib.Path, str]

Get Quarto executable path and version.

Parameters:

verbose (bool)

Return type:

Tuple[pathlib.Path, str]

get_pandoc_with_version(verbose: bool = True) Tuple[pathlib.Path, str]

Get Pandoc executable path and version.

Parameters:

verbose (bool)

Return type:

Tuple[pathlib.Path, str]

quarto_print(text)
quarto_callout_info(info)
quarto_callout_important(info)
quarto_plot_exception(plot_name: str, e: Exception)
Parameters:
quarto_callout_no_prediction_data_warning(extra='')
quarto_callout_no_predictor_data_warning(extra='')
polars_col_exists(df, col)
polars_subset_to_existing_cols(all_columns, cols)
rag_background_styler(rag: str | None = None)
Parameters:

rag (Optional[str])

rag_background_styler_dense(rag: str | None = None)
Parameters:

rag (Optional[str])

rag_textcolor_styler(rag: str | None = None)
Parameters:

rag (Optional[str])

table_standard_formatting(source_table, title=None, subtitle=None, rowname_col=None, groupname_col=None, cdh_guidelines=CDHGuidelines(), highlight_limits: Dict[str, str | List[str]] = {}, highlight_lists: Dict[str, List[str]] = {}, highlight_configurations: List[str] = [], rag_styler: Callable = rag_background_styler)
Parameters:
  • highlight_limits (Dict[str, Union[str, List[str]]])

  • highlight_lists (Dict[str, List[str]])

  • highlight_configurations (List[str])

  • rag_styler (Callable)

table_style_predictor_count(gt, flds, cdh_guidelines=CDHGuidelines(), rag_styler=rag_textcolor_styler)
n_unique_values(dm, all_dm_cols, fld)
max_by_hierarchy(dm, all_dm_cols, fld, grouping)
avg_by_hierarchy(dm, all_dm_cols, fld, grouping)
sample_values(dm, all_dm_cols, fld, n=6)
show_credits(quarto_source: str)
Parameters:

quarto_source (str)

serialize_query(query: pdstools.utils.types.QUERY | None) Dict | None
Parameters:

query (Optional[pdstools.utils.types.QUERY])

Return type:

Optional[Dict]

deserialize_query(serialized_query: Dict | None) pdstools.utils.types.QUERY | None

Deserialize a query that was previously serialized with serialize_query.

Parameters:

serialized_query (Optional[Dict]) – A serialized query dictionary created by serialize_query

Returns:

The deserialized query

Return type:

Optional[QUERY]

remove_duplicate_html_scripts(html_content: str, verbose: bool = False) str

Remove duplicate script tags from HTML to reduce file size.

Specifically targets large JavaScript libraries (like Plotly.js) that get embedded multiple times in HTML reports, while preserving all unique plot data and initialization scripts.

Parameters:
  • html_content (str)

  • verbose (bool)

Return type:

str