pdstools.utils.progress_utils
=============================

.. py:module:: pdstools.utils.progress_utils

.. autoapi-nested-parse::

   Utilities for progress feedback and time estimation.

   This module provides functions to estimate operation times and format them
   in user-friendly ways. Used primarily by the Decision Analysis Tool Streamlit
   app to show progress feedback for long-running operations like:

   - Extracting large zip archives
   - Sampling large datasets

   The estimates are based on calibrated speeds and provide ranges to account
   for system variability.


Functions
---------

.. autoapisummary::

   pdstools.utils.progress_utils.estimate_extraction_time
   pdstools.utils.progress_utils.format_time_estimate
   pdstools.utils.progress_utils.estimate_sampling_time


Module Contents
---------------

.. py:function:: estimate_extraction_time(file_size_bytes: int) -> tuple[float, float]

   Estimate extraction time for a zip file based on size.

   Uses calibrated extraction speeds to provide min/max range.
   Conservative estimates to avoid under-promising.

   :param file_size_bytes: Size of the file in bytes
   :type file_size_bytes: int

   :returns: (min_seconds, max_seconds) for a range estimate
   :rtype: tuple[float, float]

   .. rubric:: Examples

   >>> min_time, max_time = estimate_extraction_time(1024 * 1024 * 1024)  # 1 GB
   >>> min_time < max_time
   True


.. py:function:: format_time_estimate(min_sec: float, max_sec: float) -> str

   Format time range as user-friendly string.

   Uses humanize library to create natural language time descriptions.
   Shows ranges for operations over 10 seconds, simple descriptions for
   shorter operations.

   :param min_sec: Minimum estimated time in seconds
   :type min_sec: float
   :param max_sec: Maximum estimated time in seconds
   :type max_sec: float

   :returns: User-friendly time description
   :rtype: str

   .. rubric:: Examples

   >>> format_time_estimate(2, 5)
   'a few seconds'

   >>> format_time_estimate(120, 180)
   '2 minutes to 3 minutes'


.. py:function:: estimate_sampling_time(total_rows: int, sample_size: int) -> tuple[float, float]

   Estimate time for sampling operations based on dataset size.

   :param total_rows: Total number of rows in the dataset
   :type total_rows: int
   :param sample_size: Target sample size
   :type sample_size: int

   :returns: (min_seconds, max_seconds) for a range estimate
   :rtype: tuple[float, float]

   .. rubric:: Examples

   >>> min_time, max_time = estimate_sampling_time(1_000_000, 50_000)
   >>> min_time < max_time
   True