pdstools.utils.report_utils =========================== .. py:module:: pdstools.utils.report_utils Attributes ---------- .. autoapisummary:: pdstools.utils.report_utils.logger Functions --------- .. autoapisummary:: pdstools.utils.report_utils.get_output_filename pdstools.utils.report_utils.copy_quarto_file pdstools.utils.report_utils.run_quarto pdstools.utils.report_utils.copy_report_resources pdstools.utils.report_utils.generate_zipped_report pdstools.utils.report_utils.get_quarto_with_version pdstools.utils.report_utils.get_pandoc_with_version pdstools.utils.report_utils.quarto_print pdstools.utils.report_utils.quarto_callout_info pdstools.utils.report_utils.quarto_callout_important pdstools.utils.report_utils.quarto_plot_exception pdstools.utils.report_utils.quarto_callout_no_prediction_data_warning pdstools.utils.report_utils.quarto_callout_no_predictor_data_warning pdstools.utils.report_utils.polars_col_exists pdstools.utils.report_utils.polars_subset_to_existing_cols pdstools.utils.report_utils.create_metric_itable pdstools.utils.report_utils.create_metric_gttable pdstools.utils.report_utils.n_unique_values pdstools.utils.report_utils.max_by_hierarchy pdstools.utils.report_utils.avg_by_hierarchy pdstools.utils.report_utils.sample_values pdstools.utils.report_utils.show_credits pdstools.utils.report_utils.serialize_query pdstools.utils.report_utils.deserialize_query pdstools.utils.report_utils.gains_table pdstools.utils.report_utils.check_report_for_errors Module Contents --------------- .. py:data:: logger .. py:function:: get_output_filename(name: str | None, report_type: str, model_id: str | None = None, output_type: str = 'html') -> str Generate the output filename based on the report parameters. .. py:function:: copy_quarto_file(qmd_file: str, temp_dir: pathlib.Path) -> None Copy the report quarto file to the temporary directory. :param qmd_file: Name of the Quarto markdown file to copy :type qmd_file: str :param temp_dir: Destination directory to copy files to :type temp_dir: Path :rtype: None .. py:function:: run_quarto(qmd_file: str | None = None, output_filename: str | None = None, output_type: str | None = 'html', params: dict | None = None, project: dict = {'type': 'default'}, analysis: dict | None = None, temp_dir: pathlib.Path = Path(), verbose: bool = False, *, full_embed: bool = False) -> int Run the Quarto command to generate the report. :param qmd_file: Path to the Quarto markdown file to render, by default None :type qmd_file: str, optional :param output_filename: Name of the output file, by default None :type output_filename: str, optional :param output_type: Type of output format (html, pdf, etc.), by default "html" :type output_type: str, optional :param params: Parameters to pass to Quarto execution, by default None :type params: dict, optional :param project: Project configuration settings, by default {"type": "default"} :type project: dict, optional :param analysis: Analysis configuration settings, by default None :type analysis: dict, optional :param temp_dir: Temporary directory for processing, by default Path(".") :type temp_dir: Path, optional :param verbose: Whether to print detailed execution logs, by default False :type verbose: bool, optional :param full_embed: When True, fully embeds all JavaScript libraries (Plotly, itables, etc.) into the HTML output (larger file). When False, loads JavaScript libraries from CDN and skips esbuild bundling, avoiding the need for esbuild (see issue #620). :type full_embed: bool, default=False :returns: Return code from the Quarto process (0 for success) :rtype: int :raises RuntimeError: If the Quarto process fails (non-zero return code), includes captured output :raises subprocess.SubprocessError: If the Quarto command fails to execute :raises FileNotFoundError: If required files are not found .. py:function:: copy_report_resources(resource_dict: list[tuple[str, str]]) Copy report resources from the reports directory to specified destinations. :param resource_dict: list of tuples containing (source_path, destination_path) pairs :type resource_dict: list[tuple[str, str]] :rtype: None .. py:function:: generate_zipped_report(output_filename: str, folder_to_zip: str) Generate a zipped archive of a directory. This is a general-purpose utility function that can compress any directory into a zip archive. While named for report generation, it works with any directory structure. :param output_filename: Name of the output file (extension will be replaced with .zip) :type output_filename: str :param folder_to_zip: Path to the directory to be compressed :type folder_to_zip: str :rtype: None :raises FileNotFoundError: If the folder to zip does not exist or is not a directory .. rubric:: Examples >>> generate_zipped_report("my_archive.zip", "/path/to/directory") >>> generate_zipped_report("report_2023", "/tmp/report_output") .. py:function:: get_quarto_with_version(verbose: bool = True) -> tuple[pathlib.Path, str] Get Quarto executable path and version. .. py:function:: get_pandoc_with_version(verbose: bool = True) -> tuple[pathlib.Path, str] Get Pandoc executable path and version. .. py:function:: quarto_print(text) .. py:function:: quarto_callout_info(info) .. py:function:: quarto_callout_important(info) .. py:function:: quarto_plot_exception(plot_name: str, e: Exception) .. py:function:: quarto_callout_no_prediction_data_warning(extra='') .. py:function:: quarto_callout_no_predictor_data_warning(extra='') .. py:function:: polars_col_exists(df, col) .. py:function:: polars_subset_to_existing_cols(all_columns, cols) .. py:function:: create_metric_itable(source_table: polars.DataFrame, column_to_metric: dict | None = None, column_descriptions: dict[str, str] | None = None, color_background: bool = False, strict_metric_validation: bool = True, highlight_issues_only: bool = False, rag_source: polars.DataFrame | None = None, **itable_kwargs) Create an interactive table with RAG coloring for metric columns. Displays the table using itables with cells colored based on RAG (Red/Amber/Green) status derived from metric thresholds. :param source_table: DataFrame containing data columns to be colored. :type source_table: pl.DataFrame :param column_to_metric: Mapping from column names (or tuples of column names) to one of: - **str**: metric ID to look up in MetricLimits.csv - **callable**: function(value) -> "RED"|"AMBER"|"YELLOW"|"GREEN"|None - **tuple**: (metric_id, value_mapping) where value_mapping is a dict that maps column values to metric values before evaluation. Supports tuple keys for multiple values: {("Yes", "yes"): True} If a column is not in this dict, its name is used as the metric ID. :type column_to_metric: dict, optional :param column_descriptions: Mapping from column names to tooltip descriptions. When provided, column headers will display the description as a tooltip on hover. Example: {"Performance": "Model AUC performance metric"} :type column_descriptions: dict, optional :param color_background: If True, colors the cell background. If False, colors the text (foreground). :type color_background: bool, default False :param strict_metric_validation: If True, raises an exception if a metric ID in column_to_metric is not found in MetricLimits.csv. Set to False to skip validation. :type strict_metric_validation: bool, default True :param highlight_issues_only: If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted). Set to False to also highlight GREEN values. :type highlight_issues_only: bool, default False :param rag_source: If provided, RAG thresholds are evaluated against this DataFrame instead of ``source_table``. Use this when ``source_table`` contains non-numeric display values (e.g. HTML strings) but you still want RAG coloring based on the original numeric data. Must have the same columns and row order as ``source_table``. :type rag_source: pl.DataFrame, optional :param \*\*itable_kwargs: Additional keyword arguments passed to itables.show(). Common options include: lengthMenu, paging, searching, ordering. :returns: An itables display object that will render in Jupyter/Quarto. :rtype: itables HTML display .. rubric:: Examples >>> from pdstools.utils.report_utils import create_metric_itable >>> create_metric_itable( ... df, ... column_to_metric={ ... # Simple metric ID ... "Performance": "ModelPerformance", ... # Custom RAG function ... "Channel": standard_NBAD_channels_rag, ... # Value mapping: column values -> metric values ... "AGB": ("UsingAGB", {"Yes": True, "No": False}), ... # Multiple column values to same metric value ... "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}), ... }, ... column_descriptions={ ... "Performance": "Model AUC performance metric", ... "Channel": "Communication channel for the action", ... }, ... paging=False ... ) .. py:function:: create_metric_gttable(source_table: polars.DataFrame, title: str | None = None, subtitle: str | None = None, column_to_metric: dict | None = None, column_descriptions: dict[str, str] | None = None, color_background: bool = True, strict_metric_validation: bool = True, highlight_issues_only: bool = True, **gt_kwargs) Create a great_tables table with RAG coloring for metric columns. Displays the table using great_tables with cells colored based on RAG (Red/Amber/Green) status derived from metric thresholds. :param source_table: DataFrame containing data columns to be colored. :type source_table: pl.DataFrame :param title: Table title. :type title: str, optional :param subtitle: Table subtitle. :type subtitle: str, optional :param column_to_metric: Mapping from column names (or tuples of column names) to one of: - **str**: metric ID to look up in MetricLimits.csv - **callable**: function(value) -> "RED"|"AMBER"|"YELLOW"|"GREEN"|None - **tuple**: (metric_id, value_mapping) where value_mapping is a dict that maps column values to metric values before evaluation. Supports tuple keys for multiple values: {("Yes", "yes"): True} If a column is not in this dict, its name is used as the metric ID. :type column_to_metric: dict, optional :param column_descriptions: Mapping from column names to tooltip descriptions. When provided, column headers will display the description as a tooltip on hover. Example: {"Performance": "Model AUC performance metric"} :type column_descriptions: dict, optional :param color_background: If True, colors the cell background. If False, colors the text. :type color_background: bool, default True :param strict_metric_validation: If True, raises an exception if a metric ID in column_to_metric is not found in MetricLimits.csv. Set to False to skip validation. :type strict_metric_validation: bool, default True :param highlight_issues_only: If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted). Set to False to also highlight GREEN values. :type highlight_issues_only: bool, default True :param \*\*gt_kwargs: Additional keyword arguments passed to great_tables.GT constructor. Common options include: rowname_col, groupname_col. :returns: A great_tables instance with RAG coloring applied. :rtype: great_tables.GT .. rubric:: Examples >>> from pdstools.utils.report_utils import create_metric_gttable >>> create_metric_gttable( ... df, ... title="Model Overview", ... column_to_metric={ ... # Simple metric ID ... "Performance": "ModelPerformance", ... # Custom RAG function ... "Channel": standard_NBAD_channels_rag, ... # Value mapping: column values -> metric values ... "AGB": ("UsingAGB", {"Yes": True, "No": False}), ... # Multiple column values to same metric value ... "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}), ... }, ... column_descriptions={ ... "Performance": "Model AUC performance metric", ... "Channel": "Communication channel for the action", ... }, ... rowname_col="Name", ... ) .. py:function:: n_unique_values(dm, all_dm_cols, fld) .. py:function:: max_by_hierarchy(dm, all_dm_cols, fld, grouping) .. py:function:: avg_by_hierarchy(dm, all_dm_cols, fld, grouping) .. py:function:: sample_values(dm, all_dm_cols, fld, n=6) .. py:function:: show_credits(quarto_source: str | None = None) Display a credits section with build metadata at the end of a report. Prints a formatted block containing the generation timestamp, Quarto and Pandoc versions, and optionally the source notebook path. :param quarto_source: Path or identifier of the source .qmd file. Include this for standalone reports where knowing the source is useful. Omit for Quarto website projects where pages are generated from templates. :type quarto_source: str, optional .. py:function:: serialize_query(query: pdstools.utils.types.QUERY | None) -> dict | None .. py:function:: deserialize_query(serialized_query: dict | None) -> pdstools.utils.types.QUERY | None Deserialize a query that was previously serialized with serialize_query. :param serialized_query: A serialized query dictionary created by serialize_query :type serialized_query: Optional[dict] :returns: The deserialized query :rtype: Optional[QUERY] .. py:function:: gains_table(df: polars.LazyFrame | polars.DataFrame, value: str, index: str | None = None, by: str | list[str] | None = None) -> polars.DataFrame Calculate cumulative gains for visualization. Computes cumulative distribution of a value metric, sorted by the ratio of value to index (or by value alone if no index). Used for gains charts to show model response skewness. :param df: Input data containing the value and optional index columns :type df: pl.LazyFrame | pl.DataFrame :param value: Column name containing the metric to compute gains for (e.g., "ResponseCount") :type value: str :param index: Column name to normalize by (e.g., population size). If None, uses row count. :type index: str, optional :param by: Column(s) to group by for separate gain curves. If None, computes single curve. :type by: str | list[str], optional :returns: DataFrame with columns: - cum_x: Cumulative proportion of index (or models) - cum_y: Cumulative proportion of value - by columns: if `by` is specified :rtype: pl.DataFrame .. rubric:: Examples >>> # Single gains curve for response count >>> gains = gains_table(df, value="ResponseCount") >>> # Gains curves by channel, normalized by population >>> gains = gains_table(df, value="Positives", index="Population", by="Channel") .. py:function:: check_report_for_errors(html_path: str | pathlib.Path) -> list[str] Check generated report HTML for error indicators. Scans the HTML file for error patterns that indicate plot rendering failures or exceptions during report generation. These errors are typically hidden in collapsed callout sections but should be caught in testing. :param html_path: Path to the HTML file to check :type html_path: str or Path :returns: List of error descriptions found (empty if no errors) :rtype: list[str] :raises FileNotFoundError: If the HTML file does not exist .. rubric:: Examples >>> from pdstools.utils.report_utils import check_report_for_errors >>> errors = check_report_for_errors("HealthCheck.html") >>> if errors: ... print(f"Found {len(errors)} error(s):") ... for error in errors: ... print(f" - {error}")