pdstools.utils.progress_utils¶
Utilities for progress feedback and time estimation.
This module provides functions to estimate operation times and format them in user-friendly ways. Used primarily by the Decision Analysis Tool Streamlit app to show progress feedback for long-running operations like:
Extracting large zip archives
Sampling large datasets
The estimates are based on calibrated speeds and provide ranges to account for system variability.
Functions¶
|
Estimate extraction time for a zip file based on size. |
|
Format time range as user-friendly string. |
|
Estimate time for sampling operations based on dataset size. |
Module Contents¶
- estimate_extraction_time(file_size_bytes: int) tuple[float, float]¶
Estimate extraction time for a zip file based on size.
Uses calibrated extraction speeds to provide min/max range. Conservative estimates to avoid under-promising.
- Parameters:
file_size_bytes (int) – Size of the file in bytes
- Returns:
(min_seconds, max_seconds) for a range estimate
- Return type:
Examples
>>> min_time, max_time = estimate_extraction_time(1024 * 1024 * 1024) # 1 GB >>> min_time < max_time True
- format_time_estimate(min_sec: float, max_sec: float) str¶
Format time range as user-friendly string.
Uses humanize library to create natural language time descriptions. Shows ranges for operations over 10 seconds, simple descriptions for shorter operations.
- Parameters:
- Returns:
User-friendly time description
- Return type:
Examples
>>> format_time_estimate(2, 5) 'a few seconds'
>>> format_time_estimate(120, 180) '2 minutes to 3 minutes'
- estimate_sampling_time(total_rows: int, sample_size: int) tuple[float, float]¶
Estimate time for sampling operations based on dataset size.
- Parameters:
- Returns:
(min_seconds, max_seconds) for a range estimate
- Return type:
Examples
>>> min_time, max_time = estimate_sampling_time(1_000_000, 50_000) >>> min_time < max_time True