pdstools.utils.report_utils
===========================

.. py:module:: pdstools.utils.report_utils


Attributes
----------

.. autoapisummary::

   pdstools.utils.report_utils.logger


Functions
---------

.. autoapisummary::

   pdstools.utils.report_utils.get_output_filename
   pdstools.utils.report_utils.copy_quarto_file
   pdstools.utils.report_utils.run_quarto
   pdstools.utils.report_utils.copy_report_resources
   pdstools.utils.report_utils.generate_zipped_report
   pdstools.utils.report_utils.get_quarto_with_version
   pdstools.utils.report_utils.get_pandoc_with_version
   pdstools.utils.report_utils.quarto_print
   pdstools.utils.report_utils.quarto_callout_info
   pdstools.utils.report_utils.quarto_callout_important
   pdstools.utils.report_utils.quarto_plot_exception
   pdstools.utils.report_utils.quarto_callout_no_prediction_data_warning
   pdstools.utils.report_utils.quarto_callout_no_predictor_data_warning
   pdstools.utils.report_utils.polars_col_exists
   pdstools.utils.report_utils.polars_subset_to_existing_cols
   pdstools.utils.report_utils.create_metric_itable
   pdstools.utils.report_utils.create_metric_gttable
   pdstools.utils.report_utils.n_unique_values
   pdstools.utils.report_utils.max_by_hierarchy
   pdstools.utils.report_utils.avg_by_hierarchy
   pdstools.utils.report_utils.sample_values
   pdstools.utils.report_utils.show_credits
   pdstools.utils.report_utils.serialize_query
   pdstools.utils.report_utils.deserialize_query
   pdstools.utils.report_utils.gains_table
   pdstools.utils.report_utils.check_report_for_errors


Module Contents
---------------

.. py:data:: logger

.. py:function:: get_output_filename(name: str | None, report_type: str, model_id: str | None = None, output_type: str = 'html') -> str

   Generate the output filename based on the report parameters.


.. py:function:: copy_quarto_file(qmd_file: str, temp_dir: pathlib.Path) -> None

   Copy the report quarto file to the temporary directory.

   :param qmd_file: Name of the Quarto markdown file to copy
   :type qmd_file: str
   :param temp_dir: Destination directory to copy files to
   :type temp_dir: Path

   :rtype: None


.. py:function:: run_quarto(qmd_file: str | None = None, output_filename: str | None = None, output_type: str | None = 'html', params: dict | None = None, project: dict = {'type': 'default'}, analysis: dict | None = None, temp_dir: pathlib.Path = Path(), verbose: bool = False, *, full_embed: bool = False) -> int

   Run the Quarto command to generate the report.

   :param qmd_file: Path to the Quarto markdown file to render, by default None
   :type qmd_file: str, optional
   :param output_filename: Name of the output file, by default None
   :type output_filename: str, optional
   :param output_type: Type of output format (html, pdf, etc.), by default "html"
   :type output_type: str, optional
   :param params: Parameters to pass to Quarto execution, by default None
   :type params: dict, optional
   :param project: Project configuration settings, by default {"type": "default"}
   :type project: dict, optional
   :param analysis: Analysis configuration settings, by default None
   :type analysis: dict, optional
   :param temp_dir: Temporary directory for processing, by default Path(".")
   :type temp_dir: Path, optional
   :param verbose: Whether to print detailed execution logs, by default False
   :type verbose: bool, optional
   :param full_embed: When True, fully embeds all JavaScript libraries (Plotly, itables,
                      etc.) into the HTML output (larger file).
                      When False, loads JavaScript libraries from CDN and skips esbuild
                      bundling, avoiding the need for esbuild (see issue #620).
   :type full_embed: bool, default=False

   :returns: Return code from the Quarto process (0 for success)
   :rtype: int

   :raises RuntimeError: If the Quarto process fails (non-zero return code), includes captured output
   :raises subprocess.SubprocessError: If the Quarto command fails to execute
   :raises FileNotFoundError: If required files are not found


.. py:function:: copy_report_resources(resource_dict: list[tuple[str, str]])

   Copy report resources from the reports directory to specified destinations.

   :param resource_dict: list of tuples containing (source_path, destination_path) pairs
   :type resource_dict: list[tuple[str, str]]

   :rtype: None


.. py:function:: generate_zipped_report(output_filename: str, folder_to_zip: str)

   Generate a zipped archive of a directory.

   This is a general-purpose utility function that can compress any directory
   into a zip archive. While named for report generation, it works with any
   directory structure.

   :param output_filename: Name of the output file (extension will be replaced with .zip)
   :type output_filename: str
   :param folder_to_zip: Path to the directory to be compressed
   :type folder_to_zip: str

   :rtype: None

   :raises FileNotFoundError: If the folder to zip does not exist or is not a directory

   .. rubric:: Examples

   >>> generate_zipped_report("my_archive.zip", "/path/to/directory")
   >>> generate_zipped_report("report_2023", "/tmp/report_output")


.. py:function:: get_quarto_with_version(verbose: bool = True) -> tuple[pathlib.Path, str]

   Get Quarto executable path and version.


.. py:function:: get_pandoc_with_version(verbose: bool = True) -> tuple[pathlib.Path, str]

   Get Pandoc executable path and version.


.. py:function:: quarto_print(text)

.. py:function:: quarto_callout_info(info)

.. py:function:: quarto_callout_important(info)

.. py:function:: quarto_plot_exception(plot_name: str, e: Exception)

.. py:function:: quarto_callout_no_prediction_data_warning(extra='')

.. py:function:: quarto_callout_no_predictor_data_warning(extra='')

.. py:function:: polars_col_exists(df, col)

.. py:function:: polars_subset_to_existing_cols(all_columns, cols)

.. py:function:: create_metric_itable(source_table: polars.DataFrame, column_to_metric: dict | None = None, column_descriptions: dict[str, str] | None = None, color_background: bool = False, strict_metric_validation: bool = True, highlight_issues_only: bool = False, rag_source: polars.DataFrame | None = None, **itable_kwargs)

   Create an interactive table with RAG coloring for metric columns.

   Displays the table using itables with cells colored based on RAG
   (Red/Amber/Green) status derived from metric thresholds.

   :param source_table: DataFrame containing data columns to be colored.
   :type source_table: pl.DataFrame
   :param column_to_metric: Mapping from column names (or tuples of column names) to one of:

                            - **str**: metric ID to look up in MetricLimits.csv
                            - **callable**: function(value) -> "RED"|"AMBER"|"YELLOW"|"GREEN"|None
                            - **tuple**: (metric_id, value_mapping) where value_mapping is a dict
                              that maps column values to metric values before evaluation.
                              Supports tuple keys for multiple values: {("Yes", "yes"): True}

                            If a column is not in this dict, its name is used as the metric ID.
   :type column_to_metric: dict, optional
   :param column_descriptions: Mapping from column names to tooltip descriptions. When provided,
                               column headers will display the description as a tooltip on hover.
                               Example: {"Performance": "Model AUC performance metric"}
   :type column_descriptions: dict, optional
   :param color_background: If True, colors the cell background. If False, colors the text (foreground).
   :type color_background: bool, default False
   :param strict_metric_validation: If True, raises an exception if a metric ID in column_to_metric
                                    is not found in MetricLimits.csv. Set to False to skip validation.
   :type strict_metric_validation: bool, default True
   :param highlight_issues_only: If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted).
                                 Set to False to also highlight GREEN values.
   :type highlight_issues_only: bool, default False
   :param rag_source: If provided, RAG thresholds are evaluated against this DataFrame
                      instead of ``source_table``. Use this when ``source_table`` contains
                      non-numeric display values (e.g. HTML strings) but you still want
                      RAG coloring based on the original numeric data. Must have the same
                      columns and row order as ``source_table``.
   :type rag_source: pl.DataFrame, optional
   :param \*\*itable_kwargs: Additional keyword arguments passed to itables.show().
                             Common options include: lengthMenu, paging, searching, ordering.

   :returns: An itables display object that will render in Jupyter/Quarto.
   :rtype: itables HTML display

   .. rubric:: Examples

   >>> from pdstools.utils.report_utils import create_metric_itable
   >>> create_metric_itable(
   ...     df,
   ...     column_to_metric={
   ...         # Simple metric ID
   ...         "Performance": "ModelPerformance",
   ...         # Custom RAG function
   ...         "Channel": standard_NBAD_channels_rag,
   ...         # Value mapping: column values -> metric values
   ...         "AGB": ("UsingAGB", {"Yes": True, "No": False}),
   ...         # Multiple column values to same metric value
   ...         "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}),
   ...     },
   ...     column_descriptions={
   ...         "Performance": "Model AUC performance metric",
   ...         "Channel": "Communication channel for the action",
   ...     },
   ...     paging=False
   ... )


.. py:function:: create_metric_gttable(source_table: polars.DataFrame, title: str | None = None, subtitle: str | None = None, column_to_metric: dict | None = None, column_descriptions: dict[str, str] | None = None, color_background: bool = True, strict_metric_validation: bool = True, highlight_issues_only: bool = True, **gt_kwargs)

   Create a great_tables table with RAG coloring for metric columns.

   Displays the table using great_tables with cells colored based on RAG
   (Red/Amber/Green) status derived from metric thresholds.

   :param source_table: DataFrame containing data columns to be colored.
   :type source_table: pl.DataFrame
   :param title: Table title.
   :type title: str, optional
   :param subtitle: Table subtitle.
   :type subtitle: str, optional
   :param column_to_metric: Mapping from column names (or tuples of column names) to one of:

                            - **str**: metric ID to look up in MetricLimits.csv
                            - **callable**: function(value) -> "RED"|"AMBER"|"YELLOW"|"GREEN"|None
                            - **tuple**: (metric_id, value_mapping) where value_mapping is a dict
                              that maps column values to metric values before evaluation.
                              Supports tuple keys for multiple values: {("Yes", "yes"): True}

                            If a column is not in this dict, its name is used as the metric ID.
   :type column_to_metric: dict, optional
   :param column_descriptions: Mapping from column names to tooltip descriptions. When provided,
                               column headers will display the description as a tooltip on hover.
                               Example: {"Performance": "Model AUC performance metric"}
   :type column_descriptions: dict, optional
   :param color_background: If True, colors the cell background. If False, colors the text.
   :type color_background: bool, default True
   :param strict_metric_validation: If True, raises an exception if a metric ID in column_to_metric
                                    is not found in MetricLimits.csv. Set to False to skip validation.
   :type strict_metric_validation: bool, default True
   :param highlight_issues_only: If True, only RED/AMBER/YELLOW values are styled (GREEN is not highlighted).
                                 Set to False to also highlight GREEN values.
   :type highlight_issues_only: bool, default True
   :param \*\*gt_kwargs: Additional keyword arguments passed to great_tables.GT constructor.
                         Common options include: rowname_col, groupname_col.

   :returns: A great_tables instance with RAG coloring applied.
   :rtype: great_tables.GT

   .. rubric:: Examples

   >>> from pdstools.utils.report_utils import create_metric_gttable
   >>> create_metric_gttable(
   ...     df,
   ...     title="Model Overview",
   ...     column_to_metric={
   ...         # Simple metric ID
   ...         "Performance": "ModelPerformance",
   ...         # Custom RAG function
   ...         "Channel": standard_NBAD_channels_rag,
   ...         # Value mapping: column values -> metric values
   ...         "AGB": ("UsingAGB", {"Yes": True, "No": False}),
   ...         # Multiple column values to same metric value
   ...         "AGB": ("UsingAGB", {("Yes", "yes", "YES"): True, "No": False}),
   ...     },
   ...     column_descriptions={
   ...         "Performance": "Model AUC performance metric",
   ...         "Channel": "Communication channel for the action",
   ...     },
   ...     rowname_col="Name",
   ... )


.. py:function:: n_unique_values(dm, all_dm_cols, fld)

.. py:function:: max_by_hierarchy(dm, all_dm_cols, fld, grouping)

.. py:function:: avg_by_hierarchy(dm, all_dm_cols, fld, grouping)

.. py:function:: sample_values(dm, all_dm_cols, fld, n=6)

.. py:function:: show_credits(quarto_source: str | None = None)

   Display a credits section with build metadata at the end of a report.

   Prints a formatted block containing the generation timestamp, Quarto and
   Pandoc versions, and optionally the source notebook path.

   :param quarto_source: Path or identifier of the source .qmd file. Include this for
                         standalone reports where knowing the source is useful. Omit for
                         Quarto website projects where pages are generated from templates.
   :type quarto_source: str, optional


.. py:function:: serialize_query(query: pdstools.utils.types.QUERY | None) -> dict | None

.. py:function:: deserialize_query(serialized_query: dict | None) -> pdstools.utils.types.QUERY | None

   Deserialize a query that was previously serialized with serialize_query.

   :param serialized_query: A serialized query dictionary created by serialize_query
   :type serialized_query: Optional[dict]

   :returns: The deserialized query
   :rtype: Optional[QUERY]


.. py:function:: gains_table(df: polars.LazyFrame | polars.DataFrame, value: str, index: str | None = None, by: str | list[str] | None = None) -> polars.DataFrame

   Calculate cumulative gains for visualization.

   Computes cumulative distribution of a value metric, sorted by the ratio
   of value to index (or by value alone if no index). Used for gains charts
   to show model response skewness.

   :param df: Input data containing the value and optional index columns
   :type df: pl.LazyFrame | pl.DataFrame
   :param value: Column name containing the metric to compute gains for (e.g., "ResponseCount")
   :type value: str
   :param index: Column name to normalize by (e.g., population size). If None, uses row count.
   :type index: str, optional
   :param by: Column(s) to group by for separate gain curves. If None, computes single curve.
   :type by: str | list[str], optional

   :returns: DataFrame with columns:
             - cum_x: Cumulative proportion of index (or models)
             - cum_y: Cumulative proportion of value
             - by columns: if `by` is specified
   :rtype: pl.DataFrame

   .. rubric:: Examples

   >>> # Single gains curve for response count
   >>> gains = gains_table(df, value="ResponseCount")

   >>> # Gains curves by channel, normalized by population
   >>> gains = gains_table(df, value="Positives", index="Population", by="Channel")


.. py:function:: check_report_for_errors(html_path: str | pathlib.Path) -> list[str]

   Check generated report HTML for error indicators.

   Scans the HTML file for error patterns that indicate plot rendering failures
   or exceptions during report generation. These errors are typically hidden in
   collapsed callout sections but should be caught in testing.

   :param html_path: Path to the HTML file to check
   :type html_path: str or Path

   :returns: List of error descriptions found (empty if no errors)
   :rtype: list[str]

   :raises FileNotFoundError: If the HTML file does not exist

   .. rubric:: Examples

   >>> from pdstools.utils.report_utils import check_report_for_errors
   >>> errors = check_report_for_errors("HealthCheck.html")
   >>> if errors:
   ...     print(f"Found {len(errors)} error(s):")
   ...     for error in errors:
   ...         print(f"  - {error}")