pdstools.utils.progress_utils

Utilities for progress feedback and time estimation.

This module provides functions to estimate operation times and format them in user-friendly ways. Used primarily by the Decision Analysis Tool Streamlit app to show progress feedback for long-running operations like:

  • Extracting large zip archives

  • Sampling large datasets

The estimates are based on calibrated speeds and provide ranges to account for system variability.

Functions

estimate_extraction_time(→ tuple[float, float])

Estimate extraction time for a zip file based on size.

format_time_estimate(→ str)

Format time range as user-friendly string.

estimate_sampling_time(→ tuple[float, float])

Estimate time for sampling operations based on dataset size.

Module Contents

estimate_extraction_time(file_size_bytes: int) tuple[float, float]

Estimate extraction time for a zip file based on size.

Uses calibrated extraction speeds to provide min/max range. Conservative estimates to avoid under-promising.

Parameters:

file_size_bytes (int) – Size of the file in bytes

Returns:

(min_seconds, max_seconds) for a range estimate

Return type:

tuple[float, float]

Examples

>>> min_time, max_time = estimate_extraction_time(1024 * 1024 * 1024)  # 1 GB
>>> min_time < max_time
True
format_time_estimate(min_sec: float, max_sec: float) str

Format time range as user-friendly string.

Uses humanize library to create natural language time descriptions. Shows ranges for operations over 10 seconds, simple descriptions for shorter operations.

Parameters:
  • min_sec (float) – Minimum estimated time in seconds

  • max_sec (float) – Maximum estimated time in seconds

Returns:

User-friendly time description

Return type:

str

Examples

>>> format_time_estimate(2, 5)
'a few seconds'
>>> format_time_estimate(120, 180)
'2 minutes to 3 minutes'
estimate_sampling_time(total_rows: int, sample_size: int) tuple[float, float]

Estimate time for sampling operations based on dataset size.

Parameters:
  • total_rows (int) – Total number of rows in the dataset

  • sample_size (int) – Target sample size

Returns:

(min_seconds, max_seconds) for a range estimate

Return type:

tuple[float, float]

Examples

>>> min_time, max_time = estimate_sampling_time(1_000_000, 50_000)
>>> min_time < max_time
True